PAT.WRU[1,LMM] - www.SailDart.org

perm filename PAT.WRU[1,LMM] blob sn#031678 filedate 1973-03-25 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00007 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	DRAFT	  The Pattern Match Compiler       Page 1
C00004 00003	DRAFT	  The Pattern Match Compiler       Page 2
C00008 00004	DRAFT	  The Pattern Match Compiler       Page 3
C00010 00005	DRAFT	  The Pattern Match Compiler       Page 4
C00012 00006	DRAFT	  The Pattern Match Compiler       Page 5
C00016 00007		FURTHER NOTES ON SETS and REPLACES
C00018 ENDMK
C⊗;
DRAFT	  The Pattern Match Compiler       Page 1


This is a preliminary writeup of the pattern match
compiler to be embeded as part of the CLISP feature
of BBN-LISP. It's purpose is to make easier the
writing of complicated tests by specifying a pattern
which some data structure is supposed to match.
The pattern match compiler will, upon encountering
a "match" type construction in a user's code, will
substitute an appropriate expression which will
perform that match.  This feature is not intended
to be as general purpose as many pattern match
oriented languages and features (FLIP, pattern matching
in PLANNER, etc.), but rather, as a convenience
for specifying simple tests which might otherwise
be clumsy to write, and not as intelligible.


PATTERN ELEMENTS

A pattern consists of a list of pattern elements. Each
pattern element is said to match either an element of
a data structure or a segment. (cf the editor's pattern
matcher,  "--" matches any arbitrary segment of a list, while
& or a subpattern, match only one element of a list).
Those patterns which may match a segment of a list are
called SEGMENT patterns;  those that match a single
element are called ELEMENT patterns.

DRAFT	  The Pattern Match Compiler       Page 2

ELEMENT PATTERNS:

There are several types of element patterns, best given
by their syntax:

    PATTERN			MEANING

     $1, or &		 matches any arbitrary element of a list

     's-expression       matches only an element which is EQUAL
			 to the given s-expression.

    =expression		matches only an element which is EQUAL to
			the value of the computation "expression"

    ==expression 	same as =, but uses EQ

			(note that 'FOO is really just a short hand
			 for ='FOO

    (sub-pattern)	this will match any element which matches
			the given sub-pattern


<<< SYNTAX WILL PROBABLY CHANGE ON THIS ONE: PROBABLY THIS WILL BE A SUFFIX
	TO A PATTERN; SO THAT ONE CAN SAY $1:NUMBERP, OR $3:...    >>>


      :function-name	    matches only an element for which the named
  or  :(expression in	    function returns non-NIL when applied to
	variable "**")      it.  If an expression is given (i.e. the
			    ":" is followed by a list), the atom **
			    is substitued for in the expression, and the
			    evaluation of that expression is used.

<<<  THIS ONE DOESN'T WORK TOO WELL >>>

	*		the * pattern will match any arbitrary element;
			if the entire match succeeds, however, the element
			which matched the * will be returned as the value
			of the match.  Note that usually, the pattern match
			compiler will construct an expression which returns
			either NIL if the match fails, or non-NIL if it suceeds;
			however, if a * occurs, this is overridden, and the
			expression generated will either return NIL if the
			match fails, or WHATEVER MATCHED THE *, even though
			that may be NIL.
DRAFT	  The Pattern Match Compiler       Page 3

	SEGMENT PATTERNS

PATTERN				MEANING

  $, or --		This pattern will match any segment of a list
			(even one of zero length)

  $2, $3, ...		These will match a segment of the given length
			note that $1 is not a segment pattern.  (later
			in the discussion of "←", the distinction be-
			tween a segment of length 1 and an element will
			be made clear)

 <<<   this may go away if there is not much demand to keep it >>

  $$expression		This will match a segment whose length is the
			value of the COMPUTATION "expression".   The
			reason for the two dollars is to distinguish
			between this and $ followed by a sub-pattern.
			Note that $$ 3 or $$3 is equivalent to $3.

<<<this doesn't work too well yet, except when it's the last pattern element>>>


  ! element pattern     This matches any segment which the given element
			pattern would match-- i.e. 

			   !&  =  $
			   !=FOO, if the value of FOO is (A B C) will match
			   the segment ... A B C ... etc.

			   !* is like  valueofmatch←$
DRAFT	  The Pattern Match Compiler       Page 4

	SETS and REPLACES

Any pattern element may be preceded by <var>←, or followed
by  ←<expression>.   The former means that, if everything
matches up to the given pattern element, the variable given
is to be set to WHAT MATCHED that pattern element.   In the
case of element patterns, this will be simply some CAD...DR;
for segment patterns, this will be some LDIFF (or LIST expression,
if there are a fixed number of elements).  In the latter case,
(i.e. patelt←expression), this means that if everything matches
up to the given point, the part of the data that matched is to
RPLAC'ed.  For segment replaces, this means a construction like
(RPLAC data (APPEND expression data))

<<< (1)  segment sets and replaces DO NOT WORK, except at the end
	of a pattern.

    (2) if they did, would you rather see NCONC rather than APPEND
	there?
>>>

DRAFT	  The Pattern Match Compiler       Page 5

		EXAMPLES


($ 'A $)     The $ matches any arbitrary segment, 'A matches
	     only an A, and the 2nd $ is again an arbitrary 
	     segment; thus this translates to
		(MEMB (QUOTE A) expression)

($ 'A)	     Again, the $ is an arbitrary segment; however, since
	     their is no $ after the 'A, it must be the last element
	     of the list; thus this translates to

		(EQ (CAR (LAST expression)) (QUOTE A]


('A 'B $ 'C $3 $)   (AND (EQ (CAR expression) (QUOTE A))
			 (EQ (CADR expression) (QUOTE B))
			 (CDDDR (MEMB (QUOTE C) (CDDR expression))))

		the car must be A, the cadr B, and there must be at
		least 3 elements after the first C in the CDDR.
		Note that it is assumed that one is matching lists
		which end in NIL rather than in other atoms, so that
		it is assumed that if the (CDDDR (MEMB --)) is non-
		NIL, it is actually LISTP.

(('A 'B) 'C Y←$1 $)   (AND (LISTP (CAR expression))
		      (EQ (CAAR expression) (QUOTE A))
		      (EQ (CADAR expression) (QUOTE B))
		      (NULL (CDDAR expression))
		      (EQ (CADR expression) (QUOTE C))
		      (CDDR expression)
		      (SETQ Y (CADDR expression)))

		the LISTP check on (CAR expression) is performed because
		(CAR expression) MIGHT be an atom. <<this can be turned
		of by setting the variable LISTPCHK to NIL-- if it seems
		desirable, this can become the default>>. Note that the
		value of the match is the value of the SETQ if all succeeds;
		in general, if the last pattern element is a set (last in
		print order), then the pattern match compiler will attempt
		to make it the value of the match as well. This is true even
		when the SETQ might actually return a NIL (as in this case).
		If desired, by setting ORSETQFLG to T, the last SETQ will
		be embedded in (OR (SETQ --) T), so that even if it is NIL,
		the expression will return NIL/non-NIL corresponding exactly
		to nomatch/match.
	FURTHER NOTES ON SETS and REPLACES

Normally, the match expression goes from left to right in the pattern, 
ANDing the match of each pattern element.   Thus, the pattern

('A Y←$1 'B $) will compile into (COND ((EQ (CAR expression) (QUOTE A))
					(SETQ Y (CADR expression))
					(EQ (CADDR expression) (QUOTE B]

<<<NOTE THAT CURRENTLY, IT IS ACTUALLY:
    (AND (EQ (CAR expression) (QUOTE A)) (SETQ Y (CADR expression)) 
	 (EQ (CADDR expression) (QUOTE B]
>>>

This is because one is in a FIXED CONTEXT -- i.e., one knows exactly where
the Y comes from, if the match succeeds.   However, in the pattern:

($ Y←$1 'A $),  one must search for the A before the location of the Y
		is known.  Thus, all SETQ's and RPLAC's are postponed!
		until the context is fixed.  $ and it's equivalents get
		one out of FIXED CONTEXT;  ', =, ==, :, and sub-patterns
		put one back in.   
<< I'm looking for feedback on this one; it means that it is impossible to
do something like ($ Y←$1 =Y $) to look for two consecutive elements which
are equal.   Perhaps, though, something like the # thing in FLIP is the
way to go on this-- if one really cares when a SET is done, there should
be some way of saying DO IT NOW, rather than waiting.   >>