MESS.DOC[1,LMM] - www.SailDart.org

perm filename MESS.DOC[1,LMM] blob sn#030258 filedate 1973-03-20 generic text, type T, neo UTF8


 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
      Exhaustive Generation of Cyclic and Acyclic Isomers.(1,2)






      ABSTRACT. A  systematic method  of identification  of all
      possible molecular  structures (isomers) fitting  a given
      empirical formula is described. The method, embodied in a
      computer program, generates  a complete list  of isomers.
      Duplicate structures are avoided prospectively.




















------------------

1)   For Part  X  see D.H.  Smith,  B.G. Buchanan,  W.C.  White, E.A.
Feigenbaum, J. Lederberg, and C. Djerassi, Tetrahedron, submitted.

2)   Financial support  for this  work was  provided by  the National
Institutes of Health (RR00612-03) and the Advanced  Research Projects
Agency (SD-183).
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


Problems  of structural  isomerism  in chemistry  have  received much
attention.   But  only occasional  inroads  have been  made  toward a
systematic  solution  of  the  underlying  mathematical  problems  of
isomerism.  Recently  the "boundaries, scope  and limits" (3)  of the
subject  of  structural  isomerism  of  acyclic  molecules  have been
defined  by the  DENDRAL algorithm  (3).  This  algorithm  permits an
enumeration  and  representation of  all  possible  acyclic molecular
structures with a given molecular formula.

Acyclic molecules  represent only a  subset of  molecular structures,
however, and it may be argued that cyclic structures (including those
posessing acyclic chains) are of more general interest and importance
to modern chemistry.  An approach to cyclic structure  generation has
appeared  in a  previous paper  in this  series (4).   That approach,
which manipulates acyclic isomers by removing hydrogen atoms pairwise
and  replacing  them  with  a new  bond,  has  one  serious drawback.
Because  the  approach  cannot make  efficient  use  of  the symmetry
properties of  cyclic graphs,  an extensive  amount of  computer time
(prohibitive  for other  than  very simple  cases) must  be  spent in
retrospective  checking  of each  candidate  structure  with existing
structures  to remove  duplicates.  For  this reason,  an alternative
approach  to construction  of  cyclic molecules  has  been developed.
This approach is designed  to take advantage of the  underlying graph
theoretic considerations, primarily  symmetry, to arrive at  a method
for more efficient construction of a complete and irredundant list of
isomers for a given empirical formula.

Central to the successful solution of this problem is the solution of
the subproblem  of specification  of all  isomers having  a specified
common ring system.  This topic has received attention for nearly 100
years, with limited success  (5).  Its more general  ramifications go
far beyond  organic chemistry.   Graph theoreticians  have considered
various aspects  of this topic,  frequently, but not  necessarily, in
the context of organic molecules.  Polya has presented a  theorem (6)
which permits calculation of  the number of structural isomers  for a
------------------
3)  J.  Lederberg, G.L. Sutherland,  B.G. Buchanan,  E.A. Feigenbaum,
A.V. Robertson, A.M. Duffield, and C. Djerassi, J. Amer. Chem.  Soc.,
91, 2973 (1969).

4)  Y.M. Sheikh, A.  Buchs, A.B. Delfino, G. Schroll,  A.M. Duffield,
C. Djerassi, B.G. Buchanan, G.L. Sutherland, E.A. Feigenbaum,  and J.
Lederberg, Org. Mass Spectrom., 4, 493 (1970).

5)  See, for example, A.C. Lunn and J.K. Senior, J. Phys.  Chem., 33,
1027 (1929) and references cited therein.

6)  a)  G. Polya, Compt. rend., 201, 1167 (1935);

                                  1
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


given ring system.  Hill (7) has applied this theorem  to enumeration
of isomers of simple ring compounds and Hill (8) and Taylor  (9) have
pointed out that  Polya's theorem permits enumeration  of geometrical
and  optical  isomers  in  addition  to  structural   isomers.   More
recently, formulae for enumeration of isomers of  monocyclic aromatic
compounds  based  on  graph theory,  permutation  groups  and Polya's
theorem  have  been presented  (10).   This history  of  interest and
results  provides  only  marginal  benefit  to  the  organic chemist.
Although the number of isomers may be interesting, these  methods (5-
9) do not indicate the structure of each isomer.  Also, these methods
do not provide  information on the more  general case where  the ring
system  is embedded  in a  more complex  structure.  For  even simple
cases  the  task  of  specifying  each  structure  by  hand,  without
duplication, is an onerous one.


                               METHOD

OVERVIEW

Framework.  The framework for this method is that chemical structures
_________
consist  of some  combination  of acyclic  chains and  rings  or ring
systems (4).  The problem of construction of acyclic isomers has been
solved previously (3).  If, therefore, all possible ring  systems can
be generated from all or part of the atoms in the  empirical formula,
the rings can be linked  together or be attached to  acyclic radicals
using  the  acyclic  generator.  This  yields  the  complete  list of
isomers.  The  method for construction  of ring systems  is described
below.  This description employs some terms which require definition.
The  definitions also  serve to  illustrate the  taxonomic principles
which underlie the operation of the cyclic structure generator.

The generator's view of molecular structure differs in  some respects
from  the chemist's.   A chemist,  for example,  may  view structures

------------------
    b)  G. Polya, Helv. Chim. Acta, 19, 22 (1936);
    c)  G. Polya, Z. Kryst. 92, 415 (1936);
    d)  G. Polya, Acta Math., 68, 145 (1937).
7)  a)  T.L. Hill, J. Phys. Chem., 47, 253 (1943);
    b)  T.L. Hill, ibid., p. 413.

8)  T.L. Hill, J. Chem. Phys., 11, 294 (1943).

9)  W.J. Taylor, ibid., p. 532.

10)  A.T.  Balaban and F.  Harary, Rev. Rumaine  de Chimie,  12, 1511
(1967).

                                  2
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


possessing  the  same  functional  group  or  ring  as  related.  The
generator works at the more fundamental level of the underlying graph
(4,11), as described below.

Chemical  Graph.  A  molecular structure  may be  viewed as  a graph,
________  _____
termed the "chemical graph."   A chemical graph consists  of "nodes,"
with  associated atom  names, connected  to one  another  by "edges."
Consider as an example the substituted piperazine, 1,  whose chemical
graph is illustrated in Scheme 1 as structure 2.  Note  that hydrogen
atoms are  ignored by  convention, while  the symbol  "U" in  used to
specify the unsaturation.  The degree (primary, secondary, ...)  of a
node in the chemical graph has its usual meaning, i.e., the number of
(non-hydrogen)  edges  connected to  it.   The valence  of  each atom
determines its maximum degree in the graph.

Superatom.  A chemical  graph is  separable into  cyclic  and acyclic
_________
parts, each of which may  be termed a "superatom."  A superatom  is a
structural sub-unit possessing at least one "free valence" (12).  The
chemical graph 2 arises from a combination of three  superatoms.  Two
are carbon atoms with one free valence (methyl radicals).   The third
is superatom 3, which possesses the indicated free valences  to which
the  remaining  hydrogen and  two  methyl radicals  will  be attached
(Scheme  1).  In  the subsequent  discussion, those  superatoms which
consist of rings are termed "ring-superatoms."

Ciliated Skeletons.  A "ciliated skeleton" is a graph in  which nodes
________ _________
(but  not  atom  names)  and  their  interconnections  are specified,
together  with  their free  valences.   Superatom 3  arises  from the
ciliated skeleton 4 by associating the atom names of eight carbon and
two nitrogen atoms with the skeleton (Scheme 1).

Unciliated Skeleton.  Skeletons where nodes and edges  are specified,
__________ ________
but free  valences are not,  are termed "unciliated  skeletons."  The
ciliated skeleton 4 arises  from one way of associating  sixteen free
valences with the nodes on the unciliated skeleton 5 (Scheme 1).
----------------------------------
Scheme 1






----------------------------------

------------------
12)   A free valence is a  bond on an atom in the superatom  to which
another atom (which is not part of the superatom) may be connected.

                                  3
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


Underlying Graph.   Underlying graphs (11)  possess only  tertiary or
__________ _____
higher degree nodes.  The underlying graph of the unciliated skeleton
5 is the regular trivalent graph (11) of two nodes, 6.  Note that the
remaining  nodes of  the  unciliated skeleton  5 are  of  degree two.
Removal  of  these  secondary  nodes  from  5  while   retaining  the
interconnections of the two tertiary nodes yields 6.

As  an  illustration  of  the  variety  of  structures  which  may be
constructed from a given underlying graph and empirical  formula, for
example, C  H  N , consider that graph 6 is the underlying  graph for
          10 20 2
all  bicyclic  ring  systems  (excluding  spiro  forms).   Unciliated
skeletons 7 and  8 (Scheme 2), for  example, may be  constructed from
eight  secondary nodes  and 6.   There are  many ways  of associating
sixteen free valences with  each unciliated skeleton, resulting  in a
larger number  of ciliated  skeletons.  For example,  9 and  10 arise
from different allocations of sixteen free valences to 5  (Scheme 2).
There  is  only one  way  to  associate eight  carbon  atoms  and two
nitrogen atoms with each ciliated skeleton to yield  superatoms (e.g.
11 and 12,  Scheme 2).  However,  several structures are  obtained by
associating the  remaining two  carbon atoms  (in this  example) with
each superatom, as an ethyl or two methyl groups.  Chemical graphs 13
and 14, for example,  arise from two alternative ways  of associating
two methyl groups with superatom 3.
----------------------------------
Scheme 2






----------------------------------

Multiple  Bonds.   All  multiple  bonds  (double,  triple,  ...)  are
________  _____
considered  to  be small  rings  by the  program.   Although chemists
normally  maintain  strong distinctions  between  multiple  bonds and
rings,  (for  good  reason)  it  was  easier  when  implementing  the
algorithm in a computer  program to generate multiple  bonds, whether
isolated or within  ring systems, along with  the rings.  As  will be
mentioned  subsequently,   from  the  program's   standpoint  acyclic
portions of structures cannot possess multiple bonds.

Aims.  The cyclic structure generator must produce a complete list of
____
structures  without  duplication.  By  duplicate  structures  we mean
structures  which are  equivalent  in some  well-defined  sense.  The
concept of equivalence is discussed in more detail in Appendix  A and


                                  4
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


in the accompanying  paper (13).  The  class of isomers  generated by
the  program includes  only  connectivity isomers  (see  Appendix B).
Transformations (see Appendix A) allowed under  connectivity symmetry
preserve   the  valence   and  bond   distribution  of   every  atom.
Connectivity symmetry does not consider bond lengths or  bond angles.
This  choice  of  symmetry  results  in  construction  of  a  set  of
topologically  unique  isomers.    A  more  detailed   discussion  of
isomerism and symmetry is presented in Appendix B.

Strategy.   The strategy  behind  the cyclic  structure  generator is
________
strongly  tied to  the framework  described above.   The  strategy is
summarized in  greatly simplified form  in Figure 1.   The underlying
graphs from which structures  are constructed can be specified  for a
given  problem by  a  series of  calculations.   Thus Part  A  of the
program (Figure 1)  assigns atoms in  all possible ways  to superatom
partitions.  Each partition consists of atoms assigned to one or more
"superatom-parts"  and "remaining  atoms."  Each  superatompart  is a
collection of atoms  from which all possible,  unique ring-superatoms
(see above  definition) can be  constructed based on  the appropriate
underlying graphs (Part  B, Fig. 1).   Each ring-superatom will  be a
ring  system  in  completed  structure.   Remaining  atoms  will form
acyclic parts of the final structures when combined in  all possible,
unique ways with the  ring-superatoms from each superatompart  in the
initial partition (Part C, Fig. 1).

DESCRIPTION

The subsequent  detailed description parallels  the operation  of the
computer   program   (called   the   cyclic   structure   generator).
Programming  details  as   well  as  complete  descriptions   of  the
underlying  mathematics  are  omitted for  the  sake  of  clarity and
because they are available from other sources (13-16).

The example  chosen to illustrate  each step of  the method  is C H .
                                                                 6 8
This example, however, does  not contain bivalent or  trivalent atoms

------------------
13)  Accompanying description of the labelling algorithm.

14a)  H. Brown, L. Masinter, and L.Hjelmelend,  Discrete Mathematics,
submitted;   b)  H. Brown,  L. Masinter, and L.  Hjelmelend, Stanford
Computer Science       Memo 318.

15)  H. Brown and L. Masinter

16)  Additional  information will  be available  from the  authors on
request.

                                  5
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


(e.g., oxygen and nitrogen, respectively) or atoms of valence greater
than  four,  nor  any  univalent  atoms  other  than  hydrogen (e.g.,
chlorine, fluorine).  The description indicates how  these additional
atoms are considered by the program.

Partitioning and  Labelling. The  mechanism for  structure generation
____________ ___  _________
involves a  series of  "partitioning" steps followed  by a  series of
"labelling"  steps.   Partitions  are made  of  items  which  must be
assigned to  objects (usually graph  structures or parts  thereof) as
the molecular  structures are  built up  from the  underlying graphs.
The  process by  which items  are assigned  to the  graphs  is termed
labelling  (13,14).  Examination  of Scheme  1 reveals  the different
types of  items involved.  For  example, nodes are  partitioned among
and labelled  upon the edges  of the underlying  graphs to  yield the
unciliated  skeletons.   Free  valences  are  partitioned  among  and
labelled upon  the nodes  of unciliated  skeletons to  yield ciliated
skeletons, and so forth.

Partitioning  steps  in  the subsequent  discussion  are  carried out
assuming that objects among  which items are partitioned  are indist-
inguishable.  Distinguishability  of objects  (edges, nodes,  ...) is
specified  during labelling  and will  be discussed  in  a subsequent
section.   The  partitioning  steps  performed  by  the  program  are
outlined in Table I. Each step is described in more detail below.


--------------------------------------------------------------------
                               Table I
   Partitioning Steps Performed by the Cyclic Structure Generator

Step #            Partition                 Among

   1           Atoms and Unsaturations  Superatomparts and
               in Empirical Formula     Remaining Atoms

   2           Free Valence             Atoms in Superatompart

   3           Secondary Nodes          Loops

   4           Non-looped Secondary     Edges of Graph
               Nodes

   5           Looped Secondary Nodes   Loops

   6           Ring-superatoms and      Efferent Links
               Remaining Atoms          (see Appendix D)

-------------------------------

                                  6
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


PART A.  Superatom Partitions.

Ring-superatoms  are  "two-connected"  structures,  i.e.,  the  ring-
superatom cannot  be split  into two  parts by  scission of  a single
bond.  The  atoms in  an empirical formula  may be  distributed among
from   one  to   several  such   two-connected   ring-superatoms.   A
distribution which  allots atoms to  two or more  superatomparts will
yield  (respectively)  structures   containing  two  or   more  ring-
superatoms linked together by single bonds (or acyclic chains) (17).

The initial  formation of superatom  partitions is  linked intimately
with chemistry unlike several of the subsequent operations within the
cyclic  structure  generator.   Thus,  there  are  several well-known
concepts which are  used during partitioning, dealing  primarily with
valence and unsaturation.

The first step  is to replace the  hydrogen count with the  degree of
unsaturation.  The number of unsaturations (rings plus  double bonds)
is determined from the empirical formula in the normal way,  as given
in equation 1.
                n
     U = 1/2 (2+∃  (i-2)a )                                       (1)
                i=1      i

     U = unsaturation
     i = valence
     n = maximum valence in composition
     a = number of atoms with valence i
      i

If the unsaturation count is zero, the formula is  passed immediately
to the acyclic generator.   Specifying the unsaturations as  U's, the
example   C H   becomes   C U    (hydrogen   atoms  are   omitted  by
           6 8             6 3
convention).

There  are  several  rules which  are  used  during  the partitioning
scheme, as follows:

I.  The resulting formula is stripped of other univalent atoms (e.g.,

------------------
17)  Chemists  are more  familiar with  terms such  as rings  or ring
systems.  The  term two-connected  is used  here in  conjunction with
ring-superatoms  for  a  more  precise  description.    For  example,
biphenyl may be viewed as a single ring system or two rings depending
on the chemical context.  In this work, however, biphenyl consists of
two ring-superatoms (two phenyl rings) linked by a single bond.

                                  7
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


     chlorine) as  such atoms cannot  be part of  two-connected ring-
     superatoms.  These univalent atoms are assigned to  the category
     of remaining atoms.

II.  The remaining atoms in a given partition (those not allocated to
     superatomparts) can  contain NO  unsaturations.  Thus  ALL rings
     and/or double bonds will be generated from the superatomparts.

III.   It follows  from  I and  II  that every  superatompart  in the
     partition  must contain  at least  two atoms  of valence  two or
     higher  plus  at  least  one  unsaturation.   If  there  are  no
     unsaturations the atoms  belong in the remaining  atoms category
     of another  partition.  In addition,  an unsaturation  cannot be
     placed on a single  atom.  This rule defines the  minimum number
     of atoms and unsaturations in a superatompart.

IV.  The maximum number of unsaturations in a superatompart  is given
     by  Equation  2.   Superatoms must  possess  at  least  one free
     valence  (12), so  that  superatomparts with  no  free valences,
     e.g., O U  or C U , are not allowed.
            2 1     2 3
                n
     U   = 1/2 (∃  (i-2)a )                                       (2)
      max       i=3      i

     U   = maximum unsaturation of a superatompart
      max
     i   = valence
     n   = maximum valence in composition
     a   = number of atoms with valence i
      i

V.   The maximum  number  of superatomparts  for a  given  formula is
     defined by equation 3.
              n
     S   = 1/2∃  a                                                (3)
      max     i=2 i

     n   = maximum valence in composition
     S   = maximum number of superatomparts in a superatom partition
      max
     note:  the summation is over all atoms of valence >2;
            univalents are not considered.

Rules I-V  define the  allowed partitions  of a  group of  atoms into
superatomparts.  These rules  do not, however, prevent  generation of
equivalent  partitions, which  would eventually  result  in duplicate


                                  8
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


structures.   Defining   a  canonical   ordering  scheme   to  govern
partitioning   prevents   equivalent  partitions.    (There   may  be
alternative schemes):

VI.  Canonical Ordering:

      a.  Partition in order of increasing number of superatomparts.

      b.  For each entry in  each part of (a), partition in  order of
      decreasing size of superatompart by allocation of atoms  one at
      a time to remaining atoms.

      c.    Each  individual   partition  containing   two   or  more
      superatomparts must be in order of equal or decreasing  size of
      the superatompart.   In other  words, the  number of  atoms and
      unsaturations in  superatompart n+1  must be  equal to  or less
      than  the number  in superatompart  n.  The  program  notes the
      equality of superatomparts in a partition to avoid repetition.

The application of rules  I-VI is best illustrated  through reference
to the  example of  C U .  The maximum  number of  superatomparts for
                     6 3
this example is  three (Equation 3).  There  is one way  to partition
*C6U3 into  one superatompart with  no remaining atoms,  partition 1,
Table II.  Subsequent assignment  of carbon atoms to the  category of
remaining  atoms  results  in partitions  2-4,  Table  II.   The next
partition following the sequence  1-4 would be C U  with  C  assigned
                                                2 3        4
to remaining atoms.  This partition is forbidden as C U  has  no free
                                                     2 3
valences.  The three ways  to partition C U  into  two superatomparts
                                         6 3
are  indicated  along  with  the  corresponding  partitions following
assignment of atoms to remaining atoms, as partitions 5-10, Table II.
Note that the next logical steps are partitions C U /C U , C U /C U ,
                                                 3 1  3 2   2 2  4 1
and so forth,  which are not allowed  by the canonical  ordering rule
VIc, above.  There is only  one unique way of partitioning  C U  into
                                                             6 3
three superatomparts, partition 11, Table II.









                                  9
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI




--------------------------------------------------------------------
                              Table II
 Allowed Partitions of C U  Into Superatomparts and Remaining Atoms.
                        6 3
Superatom
Partition       Number of  Superatompart Number   Remaining
Number     Superatomparts    1      2       3       Atoms

     1              1      C U     -       -         -
                            6 3
     2              1      C U     -       -        C
                            5 3                      1
     3              1      C U     -       -        C
                            4 3                      2
     4              1      C U     -       -        C
                            3 3                      3

     5              2      C U    C U      -         -
                            4 2    2 1
     6              2      C U    C U      -        C
                            3 2    2 1               1
     7              2      C U    C U      -        C
                            2 2    2 1               2

     8              2      C U    C U      -         -
                            4 1    2 2
     9              2      C U    C U      -        C
                            3 1    2 2               1

    10              2      C U    C U      -         -
                            3 2    3 1

    11              3      C U    C U     C U        -
                            2 1    2 1     2 1

PART B.  Ring-superatom Construction.

Each partition (Table II) must now be treated in turn.   The complete
set of  ring-superatoms for each  superatompart in a  given partition
must be constructed.  The  major steps in the procedure  are outlined
in Figure 2.

Valence List.  The first step in part B is to strip the superatompart
_______ ____
of atom names, while retaining the valence of each atom.  The numbers
of each type of atom  are saved for later labelling of  the skeleton.


                                 10
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


A valence list may then  be specified, giving in order the  number of
bi-, tri- , tetra- and  n-valent nodes which will be  incorporated in
the superatom.  Thus the  superatompart C U  is transformed  into the
                                         6 3
valence list 0 bivalents, 0 trivalents, 6 tetravalents (0, 0, 6), and
C U  becomes (0, 0, 4) (Figure 2).
 4 2

Calculation  of  Free  Valence.   From  the  valence  list   and  the
___________  __  ____  _______
associated unsaturation  count the  number of  free valences  of each
superatompart is determined using equation 4.
              n
     FV = (2 +∃  (i-2)a )-2U                                      (4)
              i=3      i

     U = unsaturation of superatompart
     i = valence
     n = maximum valence in composition
     a = number of atoms with valence i
      i
     FV = free valence

Partitioning of Free Valence.  The free valences are then partitioned
____________ __ ____ _______
among the nodes in the valence list in all possible, unique ways.  As
in  the earlier  partitioning problem,  there are  some  simple rules
which must  be followed.   Because ring-superatoms  are two-connected
structures two valences of each atom of a superatompart must  be used
to connect the atom to the ring-superatom.  Thus no free valences can
be assigned to bivalent nodes  in the valence list, a maximum  of one
to  each trivalent,  a maximum  of two  to each  tetravalent,  and so
forth.  The example (Fig. 2) is further simplified in that  there are
only tetravalent nodes in  the valence list.  Inclusion  of trivalent
nodes (e.g.,  nitrogen atoms) merely  extends the number  of possible
partitions.  The free valences are partitioned among  the tetravalent
nodes in all  ways, as illustrated in  Figure 2.  It is  important to
note that removal of atom names makes all n-valent (n=2 or 3  or ...)
nodes  in  the  valence  list equivalent  at  this  stage.   Thus the
partitions  (of  eight  free valences  among  six  tetravalent nodes)
222200, 222020, 222002, ......, 002222 are all equivalent.   Only one
of these  partitions is considered  to avoid eventual  duplication of
structures.

Degree List.   Each partition of  free valences alters  the effective
______ ____
valence of the nodes in the original valence list with respect to the
ring-superatom.   In  the  example, assignment  of  one  or  two free
valences to a  tetravalent node transforms this  node into a  tri- or
bivalent node  respectively.  As  the ring-superatom  is constructed,


                                 11
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


those  tetravalent  nodes which  have  been assigned,  say,  two free
valences, have then only two valences remaining for attachment to the
ring-superatom.  These nodes are then  of degree (18) two and  may be
termed secondary  nodes (18).   Thus the  partition of  free valences
2,2,2,2,0,0 on six tetravalent nodes yields the degree  list  (4,0,2)
(Fig.  2) as four of the tetravalent nodes receive two  free valences
each yielding four  nodes of degree  two (secondary) and  leaving two
nodes of degree  four (quaternary).  The  program keeps track  of the
number of free valences assigned to all nodes for use in a subsequent
step.

Loops.  As will be clarified in the subsequent discussion,  there are
_____
several general types of ring-superatoms which cannot  be constructed
from  the  underlying  graphs  available  in  the  CATALOG (described
below).   These  are  all cases  of  multiple  extended unsaturations
either  in the  form  of double  bonds  or rings.   Examples  are the
following:
     1)  bi-, tri-, ... n-cyclics with exocyclic double bonds;
     2)  some types of SPIRO ring systems;
     3)  allenes extended by additional double bonds, e.g.,

            c=c=c=c  .

The concept  of a loop,  consisting of a  single unsaturation  and at
least one bivalent node must be specified for these  cases.  Examples
of loops containing  one, two and three  bivalent nodes are  shown in
Scheme 3.   Note that  the two remaining  "ends" of  the unsaturation
will yield a  "looped structure" when attached  to a node in  a graph
(shown as X, Scheme 1).
---------------------
Scheme 3
   bivalents =        1            2            3




---------------------


------------------
18)  The  term  "valence  with  respect  to  the  ring-superatom"  is
confusing as valence has a clear chemical meaning.  Rather,  we speak
of  the  degree of  a  node  in the  chemical  graph  (see Overview).
Standard use of  the terms primary,  secondary, ... when  speaking of
degree refers to bonding to atoms which are not hydrogen  atoms.  Use
of degree  in this report  refers to the  number of bonds  other than
free valences.  A free valence may or may not eventually  be attached
to a hydrogen atom in the final structure.

                                 12
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


The method  for specification  of loops is  discussed in  Appendix C.
This procedure yields the reduced degree list which contains  none of
the  secondary  nodes  originally present  in  the  degree  list (see
Appendix C).

Underlying Graphs.  The  reduced degree lists  are used to  specify a
__________ ______
set of underlying graphs for the eventual ring-superatoms.   All two-
connected  structures can  be described  by their  underlying graphs,
which  are,  for  most structures,  regular  trivalent  graphs.  This
concept has been described in detail by Lederberg (11), who  has also
presented  a generation  and classification  scheme for  such graphs.
Given a set of all underlying graphs, the set of  all ring-superatoms
may  be  specified.   The underlying  graphs  are  maintained  by the
program in the CATALOG.  Catalog entries for regular trivalent graphs
possessing two, four and six nodes are presented in Table  III.  This
list must  be supplemented by  additional underlying graphs  to cover
several special cases.  Several of these additional graphs, which are
sufficient for generation of all structures in the example,  are also
presented in Table III.


--------------------------------------------------------------------
                              Table III
         Partial Listing of Underlying Graphs in the CATALOG
                    Reduced
Structure  Name*  Degree List         Remarks

            2A    (k,2,0,0)        Regular trivalent graph of 2 nodes


            4AA   (k,4,0,0...)     Regular trivalent graph of 4 nodes


            4BB   (k,4,0,0...)     Regular trivalent graph of 4 nodes


            6AAA  (k,6,0,0...)     Regular trivalent graph of 6 nodes


            6ABB  (k,6,0,0...)     Regular trivalent graph of 6 nodes


            6ACA  (k,6,0,0...)     Regular trivalent graph of 6 nodes


            6BCB  (k,6,0,0...)     Regular trivalent graph of 6 nodes



                                 13
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


            6CCC  (k,6,0,0...)     Regular trivalent graph of 6 nodes


         "Singlering e.g.
            M"    (4,0,0...)       A single ring composed of M
                                   secondary nodes.


         "Daisy"  (k,0,1,0...)     A single 2n-degree node (n=2,3...)
         e.g. $1AA


         "T03"    (k,0,3,0...)     Graph composed of three quaternary
                                   nodes


         $3BCB    (k,2,1,0...)     Graph composed of two tertiary and
                                   one quaternary node


         "T22"    (k,2,2,0...)     Graph composed of two tertiary and
                                   two quaternary nodes


         "T22"    (k,2,2,0...)     Graph composed of two tertiary and
                                   two quaternary nodes.

         "T22"    (k,2,2,0...)     Graph composed of two tertiary and
                                   two quaternary nodes


         "T22"    (k,2,2,0...)     Graph composed of two tertiary and
                                   two quaternary nodes


         $5CECC   (k,4,1,0...)     Graph composed of four tertiary
                                   and one quaternary nodes


         $BCCB    (k,4,1,0...)     Graph composed of four tertiary
                                   and one quaternary nodes


         $B:5AECA (k,4,1,0...)     Graph composed of four tertiary
                                   and one quaternary nodes


         $5ABCB   (k,4,1,0...)     Graph composed of four tertiary

                                 14
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


                                   and one quaternary nodes


         $5ACDB   (k,4,1,0...)     Graph composed of four tertiary
                                   and one quaternary nodes


              e.g.(k,0,2,0,...)  Two nodes of equal valence (>3)

* The  graphs are named according to Lederberg (11).
------------------------------------------------------------------

With the reduced degree list of a superatompart, the program requests
the appropriate CATALOG entries.  In the example (Fig. 2), the degree
list (4,0,2)  specifies underlying  graphs containing  two quaternary
nodes.  The degree list (2,4,0) specifies regular trivalent graphs of
four nodes,  of which there  are two, 4AA  and 4BB (Table  T2).  Note
that secondary nodes in the degree list are ignored  since underlying
graphs have only  trivalent or higher  nodes.  The exception  is when
only  secondary  nodes  are  present in  the  degree  list,  when the
underlying graph "Singlering" (Table III) is utilized.

Interlude.   Up  to  this  point  in  the  method,  the  program  has
_________
effectively  decomposed the  problem  into a  series  of subproblems,
working down from the  empirical formula through a set  of partitions
and  subpartitions  to   the  set  of  possible   underlying  graphs.
Subsequent steps  now work  back up,  in a  sense, as  the underlying
graphs are  expanded in  a series  of steps  to the  final structures
employing the results of the previous partitions.

Labelling Edges  of Underlying Graphs  with Special  Secondary Nodes.
_________ _____  __ __________ ______  ____ _______  _________ _____
The first step in building  up structures is to attach  the secondary
nodes in the reduced degree  list to the underlying graphs.   It will
be recalled that these secondary nodes are those that will have loops
attached, thus the term "special secondary nodes" is used.  Because a
graph  structure is  involved, the  procedure by  which  the possible
attachments of the nodes to the graph are specified may be  viewed as
a "labelling"  of the  graph.  This is  the first  of six  such graph
labelling steps required by the program.  The six steps are listed in
Table  IV.  All  steps involve  the same  labelling problem,  that of
associating a set of n not necessarily distinct labels with a  set of
n  not necessarily  distinct objects.   This problem  is solved  by a
single  algorithm, the  "labelling algorithm",  for each  of  the six
labelling steps  (Table III).   This algorithm  is described  in some
detail  in the  subsequent publication  (11).  A  description  of the
underlying  mathematics and  proof of  thoroughness  and irredundancy
will appear separately (12).


                                 15
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI




--------------------------------------------------------------------
                              Table IV
 The Six Graph Labelling Steps Performed by the Labelling Algorithm

Labelling Step          Function

     1                  Label Edges of Underlying Graphs with
                        Special Secondary Nodes.

     2                  Label Edges of Resulting Graphs with
                        Non-Looped Secondary Nodes.

     3                  Label Loops of Resulting Graphs with
                        Looped Secondary Nodes.

     4                  Label Nodes of Skeletons with Free
                        Valences.

     5                  Label Nodes of Skeletons with Atom Names.

     6                  Label Free Valences of Superatoms with
                        Radicals (see Appendix D).

It  is  important in  the  context of  this  discussion,  however, to
discuss  some aspects  of the  first labelling  step to  indicate how
equivalent  labellings   (which  would  eventually   yield  duplicate
structures) may be  avoided prospectively.  This discussion  is valid
for each of  the  subsequent labelling steps.   Equivalent labellings
are prospectively avoided  by recognition of the  symmetry properties
of, in  the first  labelling, the  underlying graph.   These symmetry
properties  are  expressed in  terms  of the  permutation  group (see
Appendix A and refs. 13 and 14), in the first labelling, on the edges
of the underlying graph.   This permutation group, which  defines the
equivalence  of  the  edges,  may be  specified  in  the  CATALOG or,
alternatively, calculated as needed by a separate part of  the cyclic
structure generator.  As  subsequent labelling steps are  executed, a
new permutation group (e.g., on the nodes for labelling step four) is
calculated, again as  necessary.  Thus, only labellings  which result
in unique expansions of the structure are permitted.

In the  simple example,  (Fig. 2), the  symmetries of  the underlying
graphs and subsequent skeletons  can be discerned easily by  eye.  In
the general case this is  not true, however.  For example,  all edges
of  the  underlying  graph  containing  two  tetravalent   nodes  are
equivalent, as are the edges  of the regular trivalent graphs  2A and


                                 16
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


also 4BB.   The $3BCB graph  (Table II, Fig.  2) has  four equivalent
edges  and  one other  edge,  and  so forth.   In  the  general case,
however,  the  symmetries  of the  underlying  graphs  and subsequent
expansions thereof are not always obvious.

With  distinguishability  specified  by  the  group  on   the  edges,
labelling of  the underlying graphs  with special secondary  nodes is
carried out.  The results of this procedure for partitions containing
loops are indeicated in Figure 2.

Partition  of  and  Labelling  With  Non-Loop  Secondary  Nodes.  The
_________  __  ___  _________  ____  ________  _________  _____
secondary  nodes  which  were  not  assigned  to  loops  ("non-looped
secondary nodes") are partitioned among the edges of the graphs which
arise   from   the   previous  labelling   step.    As   in  previous
partitionings,   the    edges   are    initially   assumed    to   be
indistinguishable.   In addition,  loops  are not  counted  as edges.
There  are,  for  example,  five  ways  to  partition  four  non-loop
secondary nodes  among the edges  of the underlying  graph possessing
two quaternary nodes (Fig. 2).

Labelling of the graphs with non-loop secondary nodes is then carried
out.  Each of the five partitions mentioned immediately above results
in a  single labelling,  as all  edges of  the graph  are equivalent.
When edges are distinguishable there  may be several ways to  label a
graph with a single partition.  There are, for example, for the $3BCB
graph, two ways to label with the partition 3,0,0,0,0, four ways with
the partition 2,1,0,0,0 and  three ways with the  partition 1,1,1,0,0
(Fig. 2).

Partitioning of and Labelling with Loop Seconday Nodes.  There remain
____________ __ ___ _________ ____ ____ ________ _____
unassigned to  the graphs  at this point  only secondary  nodes which
were assigned to loops.  These nodes are first partitioned  among the
loops assuming indistinguish-ability  and remembering that  each loop
must receive at least one secondary node.  For example, following the
path from the  degree list (4,0,2)  through the specification  of two
loops (Fig. 2),  there are two ways  of labelling the  two equivalent
loops with four secondary nodes.   There is one way to label  the two
loops of the adjacent graph with three secondary nodes and one way of
labelling the two loops of  each of the two remaining graphs  in this
section  of  Figure 2  with  two secondary  nodes.   In  this example
(*C6U3) the loops in every  case are equivalent or there is  only one
loop  to  be  labelled.   In  the  general  case  loops  may  not  be
equivalent, resulting in a greater number of ways to label loops with
a given partition of secondary nodes.

Unciliated  Skeletons.  The  previous labelling  steps  specified the
__________  _________
number of secondary  nodes on each edge  of and loop attached  to the


                                 17
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


underlying graphs.  All atoms in the original superatompart  are thus
accounted  for.  A  representation of  the result  is  the unciliated
skeleton,  where  nodes  and their  connections  to  one  another are
specified.  These skeletons begin to resemble the final structures.

Free Valence Labelling.  Each of the nodes in an  unciliated skeleton
____ _______ _________
is then labelled  with free valences.   This labelling is  trivial in
the  example, as  all atoms  are of  the same  valence  (four).  This
labelling  is illustrated  in some  cases in  Figure 2,  yielding the
indicated   ciliated   skeletons.    This   labelling   is  performed
independent of the IDENTITIES of the atoms, but with knowledge of how
many  atoms  of  each  valence  type  were  present  in  the original
superatompart.

A  simple  example  illustrates  the  potential  complexity  of  this
labelling  problem  when atoms  of  more than  one  valence  type are
present.  If there were, say, n nitrogen atoms in  the superatompart,
in addition to m carbon atoms, then n nodes in the  ciliated skeleton
must be  trivalent, and m  must be tetravalent,  in all  unique ways.
The  free  valence  labelling  would  be  done  accordingly, yielding
bivalent nodes with one or two free valences and trivalent nodes with
zero or one free valences.  In the general case there may  be several
ways  to  perform this  labelling  on a  single  unciliated skeleton,
whereas in the C U  example there is only one way.
                6 3

Atom  Name Labelling.   The  nodes of  a ciliated  skeleton  are then
____  ____ _________
labelled with atom names to yield the ring-superatom(s).   Again this
labelling is  trivial in  the example, as  only one  type of  atom is
present (carbon), yielding in each case only a single superatom (Fig.
2).  If there  is more than  one type of  atom with the  same valence
(e.g., silicon  and carbon), the  labelling problem is  more complex.
Each node of appropriate valence may be labelled with either  type of
atom.  Duplicate structures are  avoided by calculating the  group on
the set of nodes of equal valence and labelling accordingly.

PART C.  ACYCLIC GENERATOR.

The superatom partition expanded in the example had no atoms assigned
to  acyclic  chains  (remaining atoms).   The  set  of  superatoms on
completion of Part B, above, thus yields the set of 36  structures on
placement of a hydrogen atom  on each free valence (Fig. 2).   If the
superatom partition (partitions  2-11, Table II) contained  more than
one superatompart or remaining  atoms, the acyclic generator  must be
used to  connect the  segments of  the structure  in all  ways.  This
procedure is described in detail in Appendix D.



                                 18
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


                             DISCUSSION

Completion  of  C U .   The  example  (Fig.  2)  has  considered only
__________  __  _ _
                 6 3
                 _ _
expansion  of  a single  superatom  partition.  The  total  number of
isomers of C U  and  their structures considering all  partitions has
            6 3
been determined by the program to be 159.  It may be  instructive for
the reader to attempt  to generate, by hand, structures  from another
partition, following the method outlined above and in Figure  2.  The
initial superatom  partitions yield some  indication of the  types of
structures  which  will  result from  each  partition.   For example,
partition 4 (Table  II), C U  in  a single superatompart,  plus three
                          3 3
carbons  in  the   remaining  atoms,  should  yield   all  structures
containing a  three-membered ring  possessing two  double bonds  or a
triple  bond.  As  there are  only two  free valences,  the remaining
atoms can be in a single chain (as a propyl or iso-propyl radical) or
as a methyl and an ethyl group, but not as three methyl groups.

Completeness and Irredundancy.  Although a mathematical proof  of the
____________ ___ ____________
completeness and irredundancy of the method exists (14 15),  there is
no guarantee that the  implementation of the algorithm in  a computer
program maintains these  desired characteristics.  Confidence  in the
thoroughness and irredundancy of  the algorithm can be  engendered in
the following ways:

      1)   Duplication  of  the  program's  performance  by  another,
completely independent approach.  If such an approach exists  (to our
knowledge  it  does  not),  it  must  be  proven  mathematically.  An
independent  algorithm  has  been  developed  which  is   capable  of
enumeration, but  not construction, of  all isomers of  a composition
containing C,H,N, and O.  Implementation of that algorithm is limited
at the present time, unfortunately, to compositions containing only 5
or fewer  carbon atoms  and various numbers  of hydrogen  atoms.  The
limitation is due to  computer memory constraints.  For  these cases,
however, the numbers of isomers obtained by both programs agree.

      2)   Testing  by  manual  generation  of  structures.   Several
chemists,  all without  knowledge of  the algorithm  described above,
have  been  given  several test  cases,  including  C U ,  from which
                                                     6 3
structures were generated by hand.  Familiarity with chemistry  is no
guarantee  of  success,  as evidenced  by  the  performance  of three
chemists for the superficially  simple case of C U  (C H ,  Table V).
                                                6 3   6 8
This example indicates that for  more than very trivial cases,  it is


                                 19
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


extremely difficult to avoid duplicates (tricyclics, for example, are
difficult to  visualize when testing  for duplicates)  and omissions.
Omissions appear  to result from  both being careless  and forgetting
ring systems that are  implausible or unfamiliar.  The  program seems
better at testing the chemist than vice versa.  In every  instance of
manual  structure generation,  no one  has been  able to  construct a
legal structure  that the  program failed to  construct.  No  one has
been able to detect an instance of duplication by the  program.  This
performance builds some  confidence, but manual verification  of more
complicated cases  is extremely tedious  and difficult.   Isomers for
many empirical  formulae have  been generated,  and some  results are
tabulated in Table VI.  The choice of examples has been  motivated by
a desire  to test  all parts of  the program  where errors  may exist
while  keeping   the  number  of   isomers  small  enough   to  allow
verification.  In this manner all obvious sources of error  have been
checked, for example, construction of loops on loops,  multiple types
of  atoms  of  the  same valence  (e.g.  Cl,  Br,  I)  and partitions
containing atoms of  several different valences including  penta- and
hexavalent atoms.

      3)   Varying the  order of  generation.  The  structure  of the
program  permits  additional  tests by  doing  some  operations  in a
different  order.  For  example, one  variation allowed  is  to leave
hydrogens associated with the atoms in each partition rather  than to
strip  them  away initially  and  place them  on  the  remaining free
valences in the last step.   Each such test has resulted in  the same
set of isomers.

In summary, the  verification procedures utilized have  all indicated
absence of  errors in the  computer implementation of  the algorithm.
Also,  there is  no clear  reason why  generation of  larger  sets of
isomers  should  not  also  proceed  correctly.   The  final  verdict
however,  must  await  development  of  new  mathematical  tools  for
verification by enumeration (see above) or an alternative algorithm.















                                 20
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI




--------------------------------------------------------------------
                               Table V
                             *
         Performance of Three  Chemists in Manual Generation
of Isomers of C H  (C U ). There are 159 Isomers|
               6 8   6 3

                Number Generated            Type of Error

Chemist 1            161              4 duplicates; 4 omissions;
                                      2 with 7 carbon atoms.

Chemist 2            168              16 duplicates; 7 omissions

Chemist 3            160              2 duplicates; 1 omission

*  One PhD and two graduate students.
-----------------





























                                 21
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI




--------------------------------------------------------------------
                              Table VI
        The Number of Isomers for Several Empirical Formulae

Empirical  Example         Number of Isomers      Manually Verified?
 Formula   Compound

   C H    benzene              217                     yes
    6 6

   C H    1,3-cyclohexadiene   159                     yes
    6 8

   C H    cyclohexene           77                     yes
    6 10

   C H    cyclohexane           25                     yes
    6 12

   C H    hexane                 5                     yes
    6 14

   C H O  phenol              2237
    6 6

   C H  O cyclohexanone        747                     no
    6 10

   C H  O 2-hexanone           211                     yes
    6 12

   C H N  pyrazole             155                     no
    3 4 2

   C H N  2-pyrazoline         136                     yes
    3 6 2

   C H N  tetrahydropyrazole    62                     no
    3 8 2

   C H  N  propylenediamine     14                     yes
    3 10 2

   C H P   (pentavalent P)     110                     no
    4 9 1
-----------------

                                 22
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


Constraints.  The structure generator  is designed to produce  a list
___________
of  all  possible  connectivity  isomers  (Appendix  B).   This  list
contains  many structures  whose  existence seems  unlikely  based on
present chemical knowledge.  In  addition, the program may  be called
on to generate possible structures for an unknown in the  presence of
a body of data on the unknown which specify various  features, (e.g.,
functional groups) of the molecule.  In such instances mechanisms are
required for  constraining the generator  to produce  only structures
conforming  to specified  rules.  The  implementation of  the acyclic
generator possessed such a mechanism in the form of GOODLIST (desired
features) and BADLIST (unwanted features) (3) which could be utilized
during the course of structure generation.

The cyclic generator is less tractable.  As in  prospective avoidance
of duplicate structures, it is important that unwanted structures, or
portions thereof, be filtered out as early in the  generation process
as possible.  It is relatively easy to specify certain  general types
of constraints in chemical terms, for example, the number of  each of
various types of rings or  ring systems in the final  structure, ring
fusions, functional groups, sub-structures  and so forth.  It  is not
always  so  easy  to  devise  an  efficient  scheme  for  utilizing a
constraint in the algorithm,  however.  As seen in the  above example
(Fig. 2) the  expanded superatom partition  results in what  would be
viewed by the chemist as several very different ring systems.

The design of the program facilitates some types of constraints.  For
example,  the  program  may  be entered  at  the  level  of combining
superatoms to generate structures from a set of known sub-structures.
If additional atoms are present in an unknown configuration, they can
be treated as a separate generation problem, the results of which are
finally  combined  in  all  ways  with  the  known  superatoms.  This
approach will not form additional two-connected  structures, however.
Constraints  which  disallow  an  entire  partition  may   be  easily
included.  For example,  it is possible  to generate only  open chain
isomers   by  "turning   off"  the   appropriate   initial  superatom
partitions.

Much additional work  remains, however, before a  reasonably complete
set of constraints can be included.  The implementation of  each type
of constraint must  be examined and tested  in detail to  ensure that
the generator remains thorough and irredundant.

CONCLUSIONS

     The boundaries, scope and limitations of chemical  structure can
now be specified.

ACKNOWLEDGEMENTS

                                 23
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


Appendix A.  Equivalence Classes and Finite Permutation Groups.

Two  members  of a  set  of possible  isomers  may be  defined  to be
equivalent if a specified  transformation of one member causes  it to
be superimposable upon another member of the set.  For example, there
are fifteen possible ways of attaching two chlorine and four hydrogen
atoms to a benzene ring (Scheme 4).
-----------------------------------------------------------------
Scheme 4








-----------------------------------------------------------------

If  rotations by  multiples of  60 degrees  are specified  as allowed
transformations,  the fifteen  structures fall  logically  into three
classes,  termed  "equivalence  classes"  (Scheme  Sa).   Within each
equivalence  class  structures  may  be  made  superimposable  by the
rotational transformation.  If one element (in this case  a molecular
structure) is chosen from each equivalence class, the complete set of
possible structures  is determined, without  duplication.  It  is the
task of  the labelling algorithm  to produce one  and only  one graph
labelling corresponding to one member of each equivalence class.

The  set  of transformations  which  define an  equivalence  class is
termed a "finite permutation  group."  This permutation group  may be
calculated based on the  symmetry properties of a graph  (or chemical
structure in  the example of  Scheme Sa).  This  calculation provides
the  mechanism  for  prospective  avoidance  of  duplication.   These
procedures are described more fully in the accompanying paper (13).














                                 24
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


Appendix B.  Isomerism and Symmetry.

Appendix A introduced the  concept of equivalence classes  and finite
permutation  groups.  The  selection of  transformation  (Appendix A)
directs the calculation of the permutation group and thus defines the
equivalence  classes.   Different  types  of  transformation  may  be
allowed depending on the symmetry properties  of the class of isomers
considered.  This Appendix discusses several of the possible types of
isomerism,  most  of  which are  familiar  to  chemists.   The reader
seeking  a  more  thorough  discussion  of  some  types  of isomerism
discussed below is referred to an exposition of molecular symmetry in
the context of chemistry and mathematics (19).

Isomers are most often defined as chemical structures  possessing the
same empirical formula.  Different concepts of symmetry give  rise to
different classes of isomers, some of which  are described below.

Permutational Isomers.  Permutational isomers are isomers  which have
_____________ _______
in common the same skeleton  and set of ligands.  They differ  in the
distribution of  ligands about the  skeleton.  Gillespie et  al. (20)
and Klemperer (21) have used the concept of permutational  isomers to
probe into unimolecular rearrangement or isomerization reactions.

Stereoisomers.   Ugi   et  al.  (19)   have  defined   the  "chemical
_____________
constitution" of an atom to be its bonds and bonded neighbors.  Those
permutational isomers which differ only by permutations of ligands at
constitutionally   equivalent    positions   form   the    class   of
stereoisomers.

Isomers  Under  Rigid  Molecular  Symmetry.   If  one   conceives  of
_______  _____  _____  _________  ________
molecular  structures  as   having  rigid  skeletons,   the  physical
rotational (three dimensional) symmetries and transformations  may be
readily defined.  Each transformation causes each atom (and  bond) to
occupy the position  of another or same  atom (and bond) so  that the
rotated structure  can physically occupy  its former position  and at
the same time be indistinguishable  from it in any way.  This  is the
most  familiar  form  of  symmetry.   Under  this  type  of  symmetry
conformers  are distinguishable  and belong  in  distinct equivalence

------------------
19)  I. Ugi, D. Marquarding, H. Klusacek, G. Gokel, and P. Gillespie,
Angew. Chem. internat. Edit., 9, 703 (1970).

20)   P.  Gillespie,  P. Hoffman,  H.  Klusacek,  D.  Marquarding, S.
Pfohl, F. Ramirez, E. A. Tsolis, and I. Ugi, Angew.  Chem.  internat.
Edit., 10, 687 (1971).
21) a)  W. G. Klemperer, J. Amer. Chem. Soc., 94, 6940 (1972);
    b)  W. G. Klemperer, ibid, p. 8360.

                                 25
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


classes.   Every  transformation  is  orthogonal  and  preserves bond
angles and bond lengths as well as maintaining true chirality.

If  one allows  other  orthogonal transformations  that  alter chiral
properties of structures, equivalence classes result that  treat both
the left-handed and right-handed forms of chiral molecules to  be the
"same".  Thus a  "mirror image" transformation when  suitably defined
permits the left-handed form to exactly superimpose  the right-handed
form and vice-versa.

Isomers Under Total Molecular Symmetry.  If in addition to  the above
_______ _____ _____ _________ ________
mentioned   rigid  molecular   transformations  one   recognizes  the
flexional movements of a nonrigid skeleton, a dynamic  symmetry group
may be defined.  Under this definition, different conformers  now are
grouped  together.   Thus  the "chair"  and  "boat"  conformations of
cyclohexane  belong  to  the  same  equivalence  class  under dynamic
symmetry.   The   permutation  group   of  skeletal   flexibility  is
computable separately and independently of rigid  molecular symmetry.
One can then view total molecular symmetry as the product of  the two
finite permutation groups.

Isomers  Under Connectivity  Symmetry.  The  concept  of connectivity
_______  _____ ____________  ________
symmetry   was   introduced  previously   (METHOD   section).   Every
permutation  of  atoms  and  bonds  onto  themselves  is  a  symmetry
transformation for connectivity symmetry if,

      a) each atom is mapped into another of like species, e.g., N to
      N, C to C, O to O, and

      b) for  every pair  of atoms,  the connectivity  (none, single,
      double , triple, ...) is preserved in the mapping, i.e. the the
      connectivity of the two atoms is identical to  the connectivity
      of the atoms they are mapped into.

One   can   readily  recognize   that   transformations   as  defined
automatically  preserve the  valence and  bond distribution  of every
atom.   It  is  very  probable  that  readers  accustomed   to  three
dimensional  rotational  and  reflectional  symmetries  will  tend to
equate them  with the symmetries  of connectivity.  It  is emphasized
again that  connectivity symmetry does  not consider bond  lengths or
bond  angles,  and  it  includes  certain  transformations  that  are
conceivable  but  have  no  physical  interpretation  save   that  of
permuting the atoms and bonds.






                                 26
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


Appendix C.  Calculation of Loops.

There are several  rules which must  be followed in  consideration of
loop  assignment  to  ring-superatoms.   The  minimum  (MINLOOPS) and
maximum  (MAXLOOPS) numbers  of loops  for a  given valence  list are
designated by equations 5 and 6.
                                   n
     MINLOOPS = max{0,a +1/2(2j   -∃  ja )}                       (5)
                       2       max i=2  j

                       n
     MAXLOOPS = min{a ,∃  ((i-2)/2)a }                            (6)
                     2 j=4          j

     MINLOOPS = minimum number of loops
     MAXLOOPS = maximum number of loops
           a  = number of secondary nodes in degree list
            2
        j     = degree of highest degree item in degree list
         max
            j = degree
            n = highest degree in list
           a  = number of nodes with degree j.
            j


The form of the equations results from the following considerations:

      1)  Only secondary  nodes may be  assigned to loops.   Nodes of
      higher degree  will always  be in the  non-loop portion  of the
      ring-superatom.

      2)  A loop, by definition,  must be attached by two bonds  to a
      single node in  the resulting ring-superatom.  The  loop cannot
      be attached  through the free  valences.  Thus the  degree list
      must possess a sufficient number of quaternary or higher degree
      nodes to support the loops(s).

      3)  Each loop must have  at least one secondary node,  which is
      the reason MAXLOOPS is restricted to at most the maximum number
      of secondary nodes in the degree list (Equation 6).

      4)   There must  be available  one unsaturation  for  each loop
      (this is implicit in the calculation of MINLOOPS  and MAXLOOPS)
      as each loop effectively forms a new ring.

Partitioning of Secondary Nodes.  For each of the possible numbers of
____________ __ _________ _____


                                 27
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


loops (0,1, ...) the secondary nodes are removed from the degree list
and partitioned among  the loops, remembering  that the loops  are at
present indistinguishable  and each  loop must  receive at  least one
secondary node.  In  the example (Fig.  2), starting with  the degree
list (4,0,2), there are three ways of partitioning the four secondary
nodes among two loops and the remaining non-looped  portion.  Removal
of the four  secondary nodes from the  degree list and  assignment of
two, three or four of them to two loops results in the list specified
in Figure 2 as the "reduced degree list".  Specification of two loops
transforms  the two  quaternary  nodes in  the degree  list  into two
secondary nodes.  This results from  the fact that two valences  of a
quaternary or higher degree node  must be used to support  each loop.
These  are "special"  secondary nodes,  however, as  these particular
nodes are the ones which will be connected to loops as  the structure
is built  up.  Thus, in  the example, any  secondary nodes  which are
found  in the  reduced degree  list will  have a  loop attached  in a
subsequent step.   The degree list  (4,0,2) thus becomes  the reduced
degree list (2,0,0) in  the partition specifying two loops  (Fig. 2).
Similarly,  the partition  of one  loop for  the degree  list (3,2,1)
results in a reduced degree  list of (1,2,0) with the  three original
secondary nodes partitioned among loop and non-loop  portions (Figure
1).

If, after the first, second, ... nth loop partition, there remain one
or more quaternary or higher degree nodes in the reduced degree list,
the  list must  be  tested again  for the  possibility  of additional
loops.  Each loop  partition will generate  a new set  of structures.
The  second  pass will  yield  those structures  possessing  loops on
loops, and so forth.  One such superatom which would be  generated in
this manner from a composition of (at least) C U  is 15.
                                              6 5
          c=c=c=c=c=c

             15

The partition of (4,0,2) including one loop results in each case in a
reduced  degree  list  (1,0,1).   This  list  is  disallowed  in  the
subsequent  step,  as  the  underlying  graph  (vide  infra)  for one
quaternary node is  a daisy (Table II),  which requires a  minimum of
two secondary nodes with which to label the daisy loops (a minimum of
one secondary node  in the reduced degree  list for each loop  of the
daisy).

Appendix D

A method of construction  of structures similar to the  generation of
acyclic molecules  is utilized to  join multiple  ring-superatoms and


                                 28
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


remaining atoms.  The  DENDRAL algorithm for construction  of acyclic
molecules (3,24) relied on the existence of a unique central atom (or
bond) to every molecule.  The present acyclic generator uses the same
idea.   The present  algorithm, though  simpler in  not having  to to
treat interconnection  of atoms  or ring-superatoms  through multiple
bonds, is  more complex  because of  the necessity  to deal  with the
symmetries of the ring-superatoms.

C1. METHOD FOR THE CASE WITH EVEN NUMBER OF TOTAL ATOMS.

The  superatom partition  C U /C U /-/C  (partition  7, Table  II and
                           2 2  2 1    2
Figure  2)  will be  used  here to  illustrate  this  procedure.  The
superatomparts C U  and C U  have exactly one possible ring-superatom
                2 2      2 1
for each (see Table VII).


--------------------------------------------------------------------
                              Table VII

     Superatompart        Superatom

       C U                 -C≡≡C-
        2 2
       C U                 >C==C<
        2 1

----------------------------------------

Thus acyclic structures are to be built with -C≡≡C- , >C==C<  and two
C's.

There are an even  number atoms and ring-superatoms.   The structures
to be  generated fall  into two categories:(a)  those with  a central
bond; (b) those with a central atom.

Category A.  CENTRAL BOND (see Fig. 3).

Step 1.  Partition into Two Parts.

The atoms  and ring-superatoms are  partitioned into two  parts, with

------------------
24)  A more complete  description of the algorithm is  available; see
B.  G.  Buchanan, A.  M.  Duffield,  and A.  V.  Robertson,  in "Mass
spectrometry,  Techniques and  Applications," G.  W. A.   Milne, ed.,
John Wiley and Sons, Inc., 1971, p. 121.

                                 29
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


each part having exactly half  the total number of items.   Each atom
or  ring-superatom  is  a  single item.   Each  part  has  to satisfy
equation 7, called the Restriction on Univalents.
Restriction on Univalents:
                n
              1+∃  (i-2)a ≥ 0                                     (7)
                i=1      i

     i = valence
    a  = number of atoms or superatoms of valence i
     i
     n = maximum valence in composition

There are  two ways  of partitioning  the four  items into  two parts
(Fig. 3).  The restriction  on univalents is satisfied in  each case.
The restriction will disallow certain partitions that have "too many"
univalents other  than hydrogens and  therefore is essential  only in
partitioning  compositions that  contain any  number  of non-hydrogen
univalents.

Step 2.  Generate Radicals from Each Part.

Using a procedure described  in Section C3, radicals  are constructed
from each  part in each  partition.  Table VIII  shows the  result of
applying this procedure to the example.
























                                 30
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI




--------------------------------------------------------------------
                             Table VIII

     Part                     Radicals
---------------------+------------------------------

   -C≡≡C- , >C==C<   ->     -C≡≡C-CH=CH2

                            -CH=CH-C≡≡CH

                            -C-C≡CH
                             "
                             CH2


    C2               ->     -CH2-CH3


   -C≡≡C- , C        ->     -C≡≡C-CH3

                            -CH2-C≡≡CH


   >C==C< , C        ->     -CH==CH-CH3

                            -C-CH3
                             "
                             CH2


                            -CH2-CH==CH2
====================================================

Step 3.  Form Molecules From Radicals.

The  radicals  are  combined in  unique  pairs,  within  each initial
partition. Each  pair gives rise  to a unique  molecule, for  each of
which is the  center is a bond.   There are nine such  molecules that
have a central bond, for the example chosen (Fig. 3).

Category B.  CENTRAL ATOM (see Fig. 4).

Step 1.  Selection of Central Atom.

One must consider every unique atom or ring-superatom that has a free


                                 31
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


valence of  three or higher  as a central  atom.  In the  example, of
three candidates available:  -C≡≡C- , >C==C< and C, the first  is not
chosen for it has a free valence of only two.

Step 2.  Partition the Rest of the Atoms.

The atom or ring-superatom chosen for the center is removed  from the
set and the rest are partitioned into a number of parts less  than or
equal to the valence of  the central atom.  Each part must  have less
than half the total number of items being partitioned (again  a ring-
superatom is a single item).  Each part must satisfy  the restriction
on univalents (equation 7).

Thus, for the  case where a  carbon is the  center, from one  to four
partitions are  generated with  the condition that  each part  has at
most (4-1)/2 or  1 atom.  No valid  partitions are possible  for one,
two or four parts.  There  is exactly one partition for  three parts,
i.e., one atom in each.  The partitions are shown in Figure 4.

Step 3.  Generate Radicals.

Once again, using the procedure described in Section C3, radicals are
generated  for  each  part  in  each  partition.   For  example,  the
partition -C≡≡C-  gives rise to  exactly one possible  radical -C≡≡CH
(Fig. 4).

Step 4.  Combine Radicals.

Although in the example shown every part generates only  one radical,
in the general  case there will be  many radicals for each  part.  If
so, the radicals must be combined to give all unique  combinations of
radicals within each partition.

Step 5.  Form Molecules from Central Atom and Radicals.

If the  central atom is  not a ring-superatom  but is a  simple atom,
then each combination of radicals derived in Step 4 defines  a single
molecule that is  unique.  Thus for example  when C is chosen  as the
center, step 4 gives  one combination of radicals which  determines a
single molecule when connected to the central C (see Figure 4).

If the central atom is a ring-superatom and the valences of the ring-
superatom are not identical  then different ways of  distributing the
radicals around the center may yield different  molecules.  Labelling
of  the free  valences of  the central  ring-superatom  with radicals
treated as labels (supplemented with adequate number of  hydrogens to
make up  the total  free valence of  the ring-superatom)  generates a


                                 32
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


complete and irredundant list of molecules.  Thus >C==C<  is labelled
with the label set:
      one of -C≡≡CH , two of -CH3 , and one of -H.

There are two unique labellings as shown in Figure 4.

C2. METHOD FOR ODD NUMBER OF TOTAL ATOMS

With an  odd number of  total atoms, no  structures can  be generated
with  a central  bond.  Only  the case  of a  central atom  is  to be
considered.  However, it is possible for structures to be  built with
a  bivalent  atom at  the  center.  Thus  the  procedure  outlined in
Category B above is followed,  in this case also allowing  a bivalent
atom to be the central atom.

C3. GENERATION OF RADICALS

The goal of this procedure is to generate all radicals from a list of
atoms and  ring-superatoms.  A radical  is defined to  be an  atom or
superatom with a  single free valence.   When a composition  of atoms
and  ring-superatoms  is presented,  from  which radicals  are  to be
generated, two special cases are recognized.

Case 1.  Onle One Atom in Composition.

When  only  one atom  which  is  not a  ring-superatom  makes  up the
composition only one radical  is possible.  As for example,  with one
C, the radical -CH3 is the only possibility.

Case 2.  Only One Item (a Ring-superatom) in Composition.

In  this case,  depending upon  the symmetry  of  the ring-superatom,
several radicals are possible.   This is determined by  labelling the
free valences of the ring-superatom with one label of a special type,
a "radical-valence".
Example:  A composition consists of one ring-superatom, 16.

                                      C-
                                     /"
                                 >C<  "
                                     \"
                                      C-

                                   16

Two radicals result from labelling with one radical valence.



                                 33
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI



                  CH                  C-
                 /"                  /"
            -CH<  "             CH2<  "
                 \"                  \"
                  CH                  CH


                17                 18

General Case.   Radicals have uniquely  defined centers as  well, the
_______ ____
center always being an atom of valence two or higher.  The  steps for
generation of radicals are as follows.

Step 1.  Selection of Central Atom.

Any  bivalent or  higher  valent atom  or ring-superatom  is  a valid
candidate to be the center of a radical.  Thus, for example,  for the
composition -C≡C-,  >C=C< (see  part 1  in Figure  3) both  are valid
central atoms (Figure 5).

Step 2.  Partition the Rest of the Atoms.

The atom chosen for the central atom is removed from the composition.
One  of  the  valences  of  the  central  atom  is  to  remain  free.
Therefore, the rest of  the atoms in the composition  are partitioned
into less than or equal to  (valence of central atom - 1)  parts.  Of
course,  each  part  should  satisfy  the  restriction  on univalents
(equation 7) but for constructing radicals there is no restriction on
the size of the parts.

Step 3.  Form Radicals from Each Part.

The procedure to  generate radicals is  freshly invoked on  each part
thus generating radicals.  Each part  in Figure 5 gives rise  to only
one radical each arising from special case 2.

Step 4.  Combine Radicals in Each Part.

For the example in Figure 5, each part yields only one radical.  In a
more  general  situation, where  the  rest of  the  composition after
selection of  center, is  partitioned into  several parts,  and where
each  part  yields several  radicals,  the radicals  are  combined to
determine all unique combinations of radicals.

Step 5.  Label Central Atom with Radicals.   If the center is an atom
(not a ring-superatom) then each unique combination defines  a single
unique radical.

                                 34
 APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI


If the  center is  a ring-superatom, the  radicals are  determined by
labelling  the center  with  a set  of labels  which  includes;I) the
radicals; ii) a leading  radical-valence; iii) an adequate  number of
hydrogens  to  make  up  the remaining  free  valences  of  the ring-
superatom.  One selection of  center gives one radical and  the other
gives two more, to complete a list of three radicals for  the example
chosen (Fig. 5).

Summary

      For  the  example chosen  to  illustrate the  operation  of the
acyclic generator, twelve isomers are generated, nine shown in Figure
3 and three shown in Figure 4.




































                                 35