perm filename MATCH.WRU[1,LMM] blob sn#038908 filedate 1973-04-30 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00007 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	DRAFT	  The Pattern Match Compiler       Page 1
C00004 00003	DRAFT	  The Pattern Match Compiler       Page 2
C00008 00004	DRAFT	  The Pattern Match Compiler       Page 3
C00010 00005	DRAFT	  The Pattern Match Compiler       Page 4
C00012 00006	DRAFT	  The Pattern Match Compiler       Page 5
C00016 00007		FURTHER NOTES ON SETS and REPLACES
C00019 ENDMK
C⊗;
DRAFT	  The Pattern Match Compiler       Page 1


This is  a preliminary writeup  of the pattern  match compiler  to be
embeded as part  of the CLISP feature of BBN-LISP. It's purpose is to
make easier the writing of complicated tests by specifying  a pattern
which some  data structure  is supposed to  match. The  pattern match
compiler  will, upon  encountering a  "match" type construction  in a
user's code,  will substitute  an appropriate  expression which  will
perform that  match.  This feature  is not intended to  be as general
purpose as many pattern match oriented languages and features  (FLIP,
pattern matching in PLANNER, etc.), but rather,  as a convenience for
specifying  simple tests  which might otherwise  be clumsy  to write,
and not as intelligible.


PATTERN ELEMENTS

A  pattern consists  of  a list  of  pattern elements.  Each  pattern
element is  said to match either an element of  a data structure or a
segment.  (cf  the  editor's  pattern  matcher,    "--"  matches  any
arbitrary segment of a list, while  & or a subpattern, match only one
element  of a list).  Those patterns which  may match a  segment of a
list are called SEGMENT  patterns; those that match a  single element
are called ELEMENT patterns.

DRAFT	  The Pattern Match Compiler       Page 2

ELEMENT PATTERNS:

There are several types of element patterns, best given
by their syntax:

    PATTERN			MEANING

     $1, or &		 matches any arbitrary element of a list

     's-expression       matches only an element which is EQUAL
			 to the given s-expression.

    =expression		matches only an element which is EQUAL to
			the value of the computation "expression"

    ==expression 	same as =, but uses EQ

			(note that 'FOO is really just a short hand
			 for ='FOO

    (sub-pattern)	this will match any element which matches
			the given sub-pattern


<<< SYNTAX WILL PROBABLY  CHANGE ON THE FOLLOWINF ONE:  PROBABLY THIS
WILL  BE A SUFFIX TO  A PATTERN; SO  THAT ONE CAN  SAY $1:NUMBERP, OR
$3:...  >>>


      :function-name	    matches only an element for which the named
  or  :(expression in	    function returns non-NIL when applied to
	variable "**")      it.  If an expression is given (i.e. the
			    ":" is followed by a list), the atom **
			    is substitued for in the expression, and the
			    evaluation of that expression is used.
			    Aternatively, the : may be followed by
			    a LAMBDA expression.


	*		the * pattern will match any arbitrary element;
			if the entire match succeeds, however, the element
			which matched the * will be returned as the value
			of the match.  Note that usually, the pattern match
			compiler will construct an expression which returns
			either NIL if the match fails, or non-NIL if it suceeds;
			however, if a * occurs, this is overridden, and the
			expression generated will either return NIL if the
			match fails, or WHATEVER MATCHED THE *, even though
			that may be NIL.
DRAFT	  The Pattern Match Compiler       Page 3

        	SEGMENT PATTERNS

PATTERN				MEANING

  $, or --		This pattern will match any segment of a list
			(even one of zero length)

  $2, $3, ...		These will match a segment of the given length
			note that $1 is not a segment pattern.  (later
			in the discussion of "←", the distinction be-
			tween a segment of length 1 and an element will
			be made clear)

 <<<   this may go away if there is not much demand to keep it >>

  $$expression		This will match a segment whose length is the
			value of the COMPUTATION "expression".   The
			reason for the two dollars is to distinguish
			between this and $ followed by a sub-pattern.
			Note that $$ 3 or $$3 is equivalent to $3.


<<<the following doesn't work too well yet, except when it's the
                    last pattern element>>>

  ! element pattern     This matches any segment which the given element
			pattern would match-- i.e. 

			   !&  =  $
			   !=FOO, if the value of FOO is (A B C) will match
			   the segment ... A B C ... etc.

			   !* is like  valueofmatch←$


DRAFT	  The Pattern Match Compiler       Page 4

	SETS and REPLACES

Any  pattern element  may  be  preceded  by <var>←,  or  followed  by
←<expression>.   The former means  that, if everything  matches up to
the given pattern element,  the variable given is  to be set to  WHAT
MATCHED that pattern element.  In  the case of element patterns, this
will  be simply  some CAD...DR;  for segment  patterns, this  will be
some LDIFF  (or  LIST expression,  if  there are  a fixed  number  of
elements).  In the  latter case, (i.e. patelt←expression), this means
that if everything  matches up to  the given point,  the part of  the
data that matched is  to RPLAC'ed.  For segment  replaces, this means
a construction like (RPLAC data (APPEND expression data))

<<< (1) segment  sets and replaces do  not work very well,  except at
        the end of a pattern.

    (2) if they did, would you rather see NCONC rather than APPEND
	there?
>>>

DRAFT	  The Pattern Match Compiler       Page 5

		EXAMPLES


($ 'A $)     The $ matches any arbitrary segment, 'A matches
	     only an A, and the 2nd $ is again an arbitrary 
	     segment; thus this translates to
		(MEMB (QUOTE A) expression)

($ 'A)	     Again, the $ is an arbitrary segment; however, since
	     their is no $ after the 'A, it must be the last element
	     of the list; thus this translates to

		(EQ (CAR (LAST expression)) (QUOTE A]


('A 'B $ 'C $3 $)   (AND (EQ (CAR expression) (QUOTE A))
			 (EQ (CADR expression) (QUOTE B))
			 (CDDDR (MEMB (QUOTE C) (CDDR expression))))

		the car must be A, the cadr B, and there must be at
		least 3 elements after the first C in the CDDR.
		Note that it is assumed that one is matching lists
		which end in NIL rather than in other atoms, so that
		it is assumed that if the (CDDDR (MEMB --)) is non-
		NIL, it is actually LISTP.

(('A 'B) 'C Y←$1 $)   (AND (LISTP (CAR expression))
		      (EQ (CAAR expression) (QUOTE A))
		      (EQ (CADAR expression) (QUOTE B))
		      (NULL (CDDAR expression))
		      (EQ (CADR expression) (QUOTE C))
		      (CDDR expression)
		      (SETQ Y (CADDR expression)))

		the LISTP check on (CAR expression) is performed because
		(CAR expression) MIGHT be an atom. <<this can be turned
		of by setting the variable LISTPCHK to NIL-- if it seems
		desirable, this can become the default>>. Note that the
		value of the match is the value of the SETQ if all succeeds;
		in general, if the last pattern element is a set (last in
		print order), then the pattern match compiler will attempt
		to make it the value of the match as well. This is true even
		when the SETQ might actually return a NIL (as in this case).
		If desired, by setting ORSETQFLG to T, the last SETQ will
		be embedded in (OR (SETQ --) T), so that even if it is NIL,
		the expression will return NIL/non-NIL corresponding exactly
		to nomatch/match.
	FURTHER NOTES ON SETS and REPLACES


Normally, the match expression goes from left to right in the pattern, 
ANDing the match of each pattern element.   Thus, the pattern

('A Y←$1 'B $) will compile into (COND ((EQ (CAR expression) (QUOTE A))
					(SETQ Y (CADR expression))
					(EQ (CADDR expression) (QUOTE B]

<<<NOTE THAT CURRENTLY, IT IS ACTUALLY:
    (AND (EQ (CAR expression) (QUOTE A)) (SETQ Y (CADR expression)) 
	 (EQ (CADDR expression) (QUOTE B]
  or, if ORSETQFLG is T, (AND ... (OR (SETQ --) T) (EQ ...)))
>>>

This is because one is in a FIXED CONTEXT -- i.e., one knows exactly where
the Y comes from, if the match succeeds.   However, in the pattern:

($ Y←$1 'A $),  one must search for the A before the location of the Y
		is known.  Thus, all SETQ's and RPLAC's are postponed!
		until the context is fixed.  $ and it's equivalents get
		one out of FIXED CONTEXT;  ', =, ==, :, and sub-patterns
		put one back in.   

<< I'm looking for feedback on this one; it means that it is impossible to
do something like ($ Y←$1 =Y $) to look for two consecutive elements which
are equal.   Perhaps, though, something like the # thing in FLIP is the
way to go on this-- if one really cares when a SET is done, there should
be some way of saying DO IT NOW, rather than waiting.   >>