perm filename PAT.WRU[1,LMM] blob
sn#031678 filedate 1973-03-25 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00007 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 DRAFT The Pattern Match Compiler Page 1
C00004 00003 DRAFT The Pattern Match Compiler Page 2
C00008 00004 DRAFT The Pattern Match Compiler Page 3
C00010 00005 DRAFT The Pattern Match Compiler Page 4
C00012 00006 DRAFT The Pattern Match Compiler Page 5
C00016 00007 FURTHER NOTES ON SETS and REPLACES
C00018 ENDMK
C⊗;
DRAFT The Pattern Match Compiler Page 1
This is a preliminary writeup of the pattern match
compiler to be embeded as part of the CLISP feature
of BBN-LISP. It's purpose is to make easier the
writing of complicated tests by specifying a pattern
which some data structure is supposed to match.
The pattern match compiler will, upon encountering
a "match" type construction in a user's code, will
substitute an appropriate expression which will
perform that match. This feature is not intended
to be as general purpose as many pattern match
oriented languages and features (FLIP, pattern matching
in PLANNER, etc.), but rather, as a convenience
for specifying simple tests which might otherwise
be clumsy to write, and not as intelligible.
PATTERN ELEMENTS
A pattern consists of a list of pattern elements. Each
pattern element is said to match either an element of
a data structure or a segment. (cf the editor's pattern
matcher, "--" matches any arbitrary segment of a list, while
& or a subpattern, match only one element of a list).
Those patterns which may match a segment of a list are
called SEGMENT patterns; those that match a single
element are called ELEMENT patterns.
DRAFT The Pattern Match Compiler Page 2
ELEMENT PATTERNS:
There are several types of element patterns, best given
by their syntax:
PATTERN MEANING
$1, or & matches any arbitrary element of a list
's-expression matches only an element which is EQUAL
to the given s-expression.
=expression matches only an element which is EQUAL to
the value of the computation "expression"
==expression same as =, but uses EQ
(note that 'FOO is really just a short hand
for ='FOO
(sub-pattern) this will match any element which matches
the given sub-pattern
<<< SYNTAX WILL PROBABLY CHANGE ON THIS ONE: PROBABLY THIS WILL BE A SUFFIX
TO A PATTERN; SO THAT ONE CAN SAY $1:NUMBERP, OR $3:... >>>
:function-name matches only an element for which the named
or :(expression in function returns non-NIL when applied to
variable "**") it. If an expression is given (i.e. the
":" is followed by a list), the atom **
is substitued for in the expression, and the
evaluation of that expression is used.
<<< THIS ONE DOESN'T WORK TOO WELL >>>
* the * pattern will match any arbitrary element;
if the entire match succeeds, however, the element
which matched the * will be returned as the value
of the match. Note that usually, the pattern match
compiler will construct an expression which returns
either NIL if the match fails, or non-NIL if it suceeds;
however, if a * occurs, this is overridden, and the
expression generated will either return NIL if the
match fails, or WHATEVER MATCHED THE *, even though
that may be NIL.
DRAFT The Pattern Match Compiler Page 3
SEGMENT PATTERNS
PATTERN MEANING
$, or -- This pattern will match any segment of a list
(even one of zero length)
$2, $3, ... These will match a segment of the given length
note that $1 is not a segment pattern. (later
in the discussion of "←", the distinction be-
tween a segment of length 1 and an element will
be made clear)
<<< this may go away if there is not much demand to keep it >>
$$expression This will match a segment whose length is the
value of the COMPUTATION "expression". The
reason for the two dollars is to distinguish
between this and $ followed by a sub-pattern.
Note that $$ 3 or $$3 is equivalent to $3.
<<<this doesn't work too well yet, except when it's the last pattern element>>>
! element pattern This matches any segment which the given element
pattern would match-- i.e.
!& = $
!=FOO, if the value of FOO is (A B C) will match
the segment ... A B C ... etc.
!* is like valueofmatch←$
DRAFT The Pattern Match Compiler Page 4
SETS and REPLACES
Any pattern element may be preceded by <var>←, or followed
by ←<expression>. The former means that, if everything
matches up to the given pattern element, the variable given
is to be set to WHAT MATCHED that pattern element. In the
case of element patterns, this will be simply some CAD...DR;
for segment patterns, this will be some LDIFF (or LIST expression,
if there are a fixed number of elements). In the latter case,
(i.e. patelt←expression), this means that if everything matches
up to the given point, the part of the data that matched is to
RPLAC'ed. For segment replaces, this means a construction like
(RPLAC data (APPEND expression data))
<<< (1) segment sets and replaces DO NOT WORK, except at the end
of a pattern.
(2) if they did, would you rather see NCONC rather than APPEND
there?
>>>
DRAFT The Pattern Match Compiler Page 5
EXAMPLES
($ 'A $) The $ matches any arbitrary segment, 'A matches
only an A, and the 2nd $ is again an arbitrary
segment; thus this translates to
(MEMB (QUOTE A) expression)
($ 'A) Again, the $ is an arbitrary segment; however, since
their is no $ after the 'A, it must be the last element
of the list; thus this translates to
(EQ (CAR (LAST expression)) (QUOTE A]
('A 'B $ 'C $3 $) (AND (EQ (CAR expression) (QUOTE A))
(EQ (CADR expression) (QUOTE B))
(CDDDR (MEMB (QUOTE C) (CDDR expression))))
the car must be A, the cadr B, and there must be at
least 3 elements after the first C in the CDDR.
Note that it is assumed that one is matching lists
which end in NIL rather than in other atoms, so that
it is assumed that if the (CDDDR (MEMB --)) is non-
NIL, it is actually LISTP.
(('A 'B) 'C Y←$1 $) (AND (LISTP (CAR expression))
(EQ (CAAR expression) (QUOTE A))
(EQ (CADAR expression) (QUOTE B))
(NULL (CDDAR expression))
(EQ (CADR expression) (QUOTE C))
(CDDR expression)
(SETQ Y (CADDR expression)))
the LISTP check on (CAR expression) is performed because
(CAR expression) MIGHT be an atom. <<this can be turned
of by setting the variable LISTPCHK to NIL-- if it seems
desirable, this can become the default>>. Note that the
value of the match is the value of the SETQ if all succeeds;
in general, if the last pattern element is a set (last in
print order), then the pattern match compiler will attempt
to make it the value of the match as well. This is true even
when the SETQ might actually return a NIL (as in this case).
If desired, by setting ORSETQFLG to T, the last SETQ will
be embedded in (OR (SETQ --) T), so that even if it is NIL,
the expression will return NIL/non-NIL corresponding exactly
to nomatch/match.
FURTHER NOTES ON SETS and REPLACES
Normally, the match expression goes from left to right in the pattern,
ANDing the match of each pattern element. Thus, the pattern
('A Y←$1 'B $) will compile into (COND ((EQ (CAR expression) (QUOTE A))
(SETQ Y (CADR expression))
(EQ (CADDR expression) (QUOTE B]
<<<NOTE THAT CURRENTLY, IT IS ACTUALLY:
(AND (EQ (CAR expression) (QUOTE A)) (SETQ Y (CADR expression))
(EQ (CADDR expression) (QUOTE B]
>>>
This is because one is in a FIXED CONTEXT -- i.e., one knows exactly where
the Y comes from, if the match succeeds. However, in the pattern:
($ Y←$1 'A $), one must search for the A before the location of the Y
is known. Thus, all SETQ's and RPLAC's are postponed!
until the context is fixed. $ and it's equivalents get
one out of FIXED CONTEXT; ', =, ==, :, and sub-patterns
put one back in.
<< I'm looking for feedback on this one; it means that it is impossible to
do something like ($ Y←$1 =Y $) to look for two consecutive elements which
are equal. Perhaps, though, something like the # thing in FLIP is the
way to go on this-- if one really cares when a SET is done, there should
be some way of saying DO IT NOW, rather than waiting. >>