perm filename BIGMES.DOC[1,LMM]2 blob
sn#037461 filedate 1973-04-20 generic text, type T, neo UTF8
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
Exhaustive Generation of Cyclic and Acyclic Isomers.(1,2)
ABSTRACT. A systematic method of identification of all
possible molecular structures (isomers) ???? a given
empirical formula is described. The method, embodied in a
computer program, generates a complete list of isomers.
Duplicate structures are avoided prospectively.
------------------
1) For Part X see D.H. Smith, B.G. Buchanan, W.C. White, E.A.
Feigenbaum, J. Lederberg, and C. Djerassi, Tetrahedron, submitted.
2) Financial support for this work was provided by the National
Institutes of Health (RR00612-03) and the Advanced Research Projects
Agency (SD-183).
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
Problems of structural isomerism in chemistry have received much
attention. But only occasional inroads have been made toward a
systematic solution of the underlying graph theoretical problems of
structural isomerism. Recently the "boundaries, scope and limits"
(3) of the subject of structural isomerism of acyclic molecules have
been defined by the DENDRAL algorithm (3). This algorithm permits an
enumeration and representation of all possible acyclic molecular
structures with a given molecular formula. Acyclic molecules
represent only a subset of molecular structures, however, and it may
be argued that cyclic structures (including those posessing acyclic
chains) are of more general interest and importance to modern
chemistry. An approach to cyclic structure generation has appeared
in a previous paper in this series (4). That approach, which
operates on a set of previously generated acyclic forms by labelling
hydrogen atoms pairwise and connecting the atoms they are attached to
with a new bond, has one serious drawback. This approach cannot make
efficient use of the symmetry properties of cyclic graphs; hence an
inordinate amount of computer time must be spent in retrospective
checking of each candidate structure with existing structures to
remove duplicates. For this reason, an alternative approach to
construction of cyclic molecules has been developed. This approach
is designed to take advantage of the underlying graph theoretic
considerations, primarily symmetry, to arrive at a method for more
efficient construction of a complete and irredundant list of isomers
for a given empirical formula.
Central to the successful solution of this problem is the generation
of all positional isomers obtained by substitutionss on a given ring
system. This topic has received attention for nearly 100 years, with
limited success (5). Its more general ramifications go far beyond
organic chemistry. Graph theoreticians have considered various
aspects of this topic, frequently, but not necessarily, in the
context of organic molecules. Polya has presented a theorem (6)
------------------
3) J. Lederberg, G.L. Sutherland, B.G. Buchanan, E.A. Feigenbaum,
A.V. Robertson, A.M. Duffield, and C. Djerassi, J. Amer. Chem. Soc.,
91, 2973 (1969).
4) Y.M. Sheikh, A. Buchs, A.B. Delfino, G. Schroll, A.M. Duffield,
C. Djerassi, B.G. Buchanan, G.L. Sutherland, E.A. Feigenbaum, and J.
Lederberg, Org. Mass Spectrom., 4, 493 (1970).
5) See, for example, A.C. Lunn and J.K. Senior, J. Phys. Chem., 33,
1027 (1929) and references cited therein.
6) a) G. Polya, Compt. rend., 201, 1167 (1935);
b) G. Polya, Helv. Chim. Acta, 19, 22 (1936);
c) G. Polya, Z. Kryst. 92, 415 (1936);
1
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
which permits calculation of the number of structural isomers for a
given ring system. Hill (7) has applied this theorem to enumeration
of isomers of simple ring compounds and Hill (8) and Taylor (9) have
pointed out that Polya's theorem permits enumeration of geometrical
and optical isomers in addition to structural isomers. More
recently, formulae for enumeration of isomers of monocyclic aromatic
compounds based on graph theory, permutation groups and Polya's
theorem have been presented (10). This history of interest and
results provides only marginal benefit to the organic chemist.
Although the number of isomers may be interesting, these methods (5-
9) do not display the structure of each isomer. Also, these methods
do not provide information on the more general case where the ring
system is embedded in a more complex structure. Even for simple
cases the task of specifying each structure by hand, without
duplication, is an onerous one.
METHOD
OVERVIEW
Framework. The framework for this method is that chemical structures
_________
consist of some combination of acyclic chains and rings or ring
systems (4). The problem of construction of acyclic isomers has been
solved previously (3). If, therefore, all possible ring systems can
be generated from all or part of the atoms in the empirical formula,
the rings can be linked together or be attached to acyclic radicals
using the acyclic generator. This yields the complete list of
isomers. The method for construction of ring systems is described
below. This description employs some terms which require definition.
The definitions also serve to illustrate the taxonomic principles
which underlie the operation of the cyclic structure generator.
The generator's view of molecular structure differs in some respects
from the chemist's. A chemist, for example, may view structures
possessing the same functional group or ring as related. The
------------------
d) G. Polya, Acta Math., 68, 145 (1937).
7) a) T.L. Hill, J. Phys. Chem., 47, 253 (1943);
b) T.L. Hill, ibid., p. 413.
8) T.L. Hill, J. Chem. Phys., 11, 294 (1943).
9) W.J. Taylor, ibid., p. 532.
10) A.T. Balaban and F. Harary, Rev. Rumaine de Chimie, 12, 1511
(1967).
2
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
generator works at the more fundamental level of the vertex-graph
(4,11), as described below.
Chemical Graph. A molecular structure may be viewed as a graph,
________ _____
termed the chemical graph, or skeleton. A chemical graph consists of
________ _____
nodes, with associated atom names, and edges, which correspond to
_____ _____
chemical bonds. Consider as an example the substituted piperazine, I,
whose chemical graph is illustrated in Scheme 1 as II. Note that
hydrogen atoms are ignored by convention, while the symbol "U" is
used to specify the unsaturation. The degree (primary, secondary,
...) of a node in the chemical graph has its usual meaning, i.e., the
number of (non-hydrogen) edges connected to it. The valence of each
atom determines its maximum degree in the graph. As usally displayed
by chemists in planar representation, the chemical graph describes
the connectivity rather than the geometric configuration of a
molecular structure.
Superatom. In general, a chemical graph can be separated into cyclic
_________
and acyclic parts. Each cyclic structural sub-unit may be deemed a
"superatom" possessing any number of "free valences" (12). The
chemical graph II arises from a combination of two carbon atoms with
superatom III. Superatom III possesses the indicated free valences to
which the remaining hydrogen and two methyl radicals will be attached
(Scheme 1).
Ciliated Skeletons. A "ciliated skeleton" is a skeleton with free
________ _________
valences. Superatom III arises from the ciliated skeleton IV by
associating the atom names of eight carbon and two nitrogen atoms
with the skeleton (Scheme 1).
Cyclic Skeleton. A chemical graph whose nodes are not associated with
______ ________
atom names containing no acyclic parts and no free valences is termed
a cyclic skeleton. Ciliated skeleton IV arises from one way of
associating sixteen free valences with the nodes on the cyclic
skeleton IV (Scheme 1).
----------------------------------
Scheme 1
------------------
12) A free valence is a bond with an unspecified terminus. Any
substructure, cyclic or not, may be treated as a superatom; however,
the term, in this paper, is generally restricted to cyclic, or ring
superatoms.
3
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
----------------------------------
Vertex Graph. Vertex-graphs (11) are cyclic skeletons from which
______ _____
nodes of degree less than three have been deleted. The vertex-graph
of the cyclic skeleton V is the regular trivalent graph (11) of two
nodes, VI. Note that the remaining nodes of the cyclic skeleton V
are of degree two. Removal of these secondary nodes from V while
retaining the interconnections of the two tertiary nodes yields VI.
As an illustration of the variety of structures which may be
constructed from a given vertex-graph and empirical formula, for
example, C H N , consider that graph VI is the vertex-graph for all
10 20 2
bicyclic ring systems (excluding spiro forms). Cyclic skeletons VII
and VIII (Scheme 2), for example, may be constructed from eight
secondary nodes and VI. There are many ways of associating sixteen
free valences with each cyclic skeleton, resulting in a larger number
of ciliated skeletons. For example, IX and X arise from different
allocations of sixteen free valences to V (Scheme 2). There is only
one way to associate eight carbon atoms and two nitrogen atoms with
each ciliated skeleton to yield superatoms (e.g. XI and XII, Scheme
2). However, several structures are obtained by associating the
remaining two carbon atoms (in this example) with each superatom, as
an ethyl or two methyl groups. Chemical graphs XIII and XIV, for
example, arise from two alternative ways of associating two methyl
groups with superatom III.
----------------------------------
Scheme 2
----------------------------------
Multiple Bonds. For the purposes of this program we adopt the
________ _____
formalism that all multiple bonds (double, triple, ...) are
considered to be small rings by the program. Previous versions
(acyclic generator) differ from this program in that double and
triple bonds are regarded as specially labelled edges.
Aims. The cyclic structure generator must produce a complete list of
____
structures without duplication. By duplicate structures we mean
structures which are equivalent in some well-defined sense. The class
of isomers generated by the program includes only connectivity
isomers. Transformations allowed under connectivity symmetry preserve
4
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
the valence and bond distribution of every atom. Connectivity
symmetry does not consider bond lengths or bond angles. This choice
of symmetry results in construction of a set of topologically unique
isomers. A more detailed discussion of equivalence is discussed in
Appendix A and in the accumpanying paper (13); a discussion of
isomerism and symmetry is presented in Appendix B.
Strategy. The strategy behind the cyclic structure generator is
________
strongly tied to the framework described above. The strategy is
summarized in greatly simplified form in Figure 1. The vertex-graphs
from which structures are constructed can be specified for a given
problem by a series of calculations. Thus Part A of the program
(Figure 1) assigns atoms in all possible ways to superatom
partitions. Each partition consists of atoms assigned to one or more
"superatom- parts" and "remaining atoms." Each superatompart is a
collection of atoms from which all possible, unique ring-superatoms
(see above definition) can be constructed based on the appropriate
vertex- graphs (Part B, Fig. 1). Each ring-superatom will be a ring
system in completed structure. Remaining atoms will form acyclic
parts of the final structures when combined in all possible, unique
ways with the ring-superatoms from each superatompart in the initial
partition (Part C, Fig. 1).
DESCRIPTION
The subsequent detailed description parallels the operation of the
computer program (called the cyclic structure generator).
Programming details as well as complete descriptions of the vertex
mathematics are omitted for the sake of clarity and because they are
available from other sources (13-16).
The example chosen to illustrate each step of the method is C H .
6 8
This example, however, does not contain bivalent or trivalent atoms
(e.g., oxygen and nitrogen, respectively) or atoms of valence greater
------------------
13) Accompanying description of the labelling algorithm.
14a) H. Brown, L. Masinter, and L.Hjelmelend, Constructive Graph
Labelling Using Double Cosets; Discrete Mathematics, in press.
also Stanford Computer Science Memo STAN-CS-72-0318.
15) H. Brown and L. Masinter, An Algorithm for the Generation of
Graphs of Organic Molecules; Discrete Mathematics, submitted.
16) Additional information will be available from the authors on
request.
5
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
than four, nor any univalent atoms other than hydrogen (e.g.,
chlorine, fluorine). The description indicates how these additional
atoms are considered by the program.
Partitioning and Labelling. The mechanism for structure generation
____________ ___ _________
involves a series of "partitioning" steps followed by a series of
"labelling" steps. Partitions are made of items which must be
assigned to objects (usually graph structures or parts thereof) as
the molecular structures are built up from the vertex-graphs. The
process by which items are assigned to the graphs is termed labelling
(13,14). Examination of Scheme 1 reveals the different types of
items involved. For example, nodes are partitioned among and
labelled upon the edges of the vertex-graphs to yield the cyclic
skeletons. Free valences are partitioned among and labelled upon the
nodes of cyclic skeletons to yield ciliated skeletons, and so forth.
Partitioning steps in the subsequent discussion are carried out
assuming that objects among which items are partitioned are indist-
inguishable. Distinguishability of objects (edges, nodes, ...) is
specified during labelling and will be discussed in a subsequent
section. The partitioning steps performed by the program are
outlined in Table I. Each step is described in more detail below.
--------------------------------------------------------------------
Table I
Partitioning Steps Performed by the Cyclic Structure Generator
Step # Partition Among
1 Atoms and Unsaturations Superatomparts and
in Empirical Formula Remaining Atoms
2 Free Valence Atoms in Superatompart
3 Secondary Nodes Loops
4 Non-looped Secondary Edges of Graph
Nodes
5 Looped Secondary Nodes Loops
6 Ring-superatoms and Efferent Links
Remaining Atoms (see Appendix D)
-------------------------------
PART A. Superatom Partitions.
6
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
Ring-superatoms are "two-connected" structures, i.e., the ring-
superatom cannot be split into two parts by scission of a single
bond. The atoms in an empirical formula may be distributed among
from one to several such two-connected ring-superatoms. A
distribution which allots atoms to two or more superatomparts will
yield (respectively) structures containing two or more ring-
superatoms linked together by single bonds (or acyclic chains) (17).
In the generation process, one must find all possible ways of
partitioning the given formula into superatom parts and remaining
atoms, such that molecules can be constructed. The considerations in
forming superatom partitions deal primarily with valence and
unsaturation. The first step is to replace the hydrogen count with
the degree of unsaturation. The number of unsaturations (rings plus
double bonds) is determined from the empirical formula in the normal
way, as given in equation 1.
n
U = 1/2 (2+∃ (i-2)a ) (1)
i=1 i
U = unsaturation
i = valence
n = maximum valence in composition
a = number of atoms with valence i
i
If the unsaturation count is zero, the formula is passed immediately
to the acyclic generator. Specifying the unsaturations as U's, the
example C H becomes C U (hydrogen atoms are omitted by
6 8 6 3
convention).
There are several rules which are used during the partitioning
scheme, as follows:
I. The resulting formula is stripped of other univalent atoms (e.g.,
chlorine) as such atoms cannot be part of two-connected ring-
superatoms. These univalent atoms are relagated to the pot of
remaining atoms.
------------------
17) Chemists are more familiar with terms such as rings or ring
systems. The term two-connected is used here in conjunction with
ring-superatoms for a more precise description. For example,
biphenyl may be viewed as a single ring system or two rings depending
on the chemical context. In this work, however, biphenyl consists of
two ring-superatoms (two phenyl rings) linked by a single bond.
Furthermore, catenates and threaded structures are considered
separate molecular structures.
7
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
II. The remaining atoms in a given partition (those not allocated to
superatomparts) can contain NO unsaturations. Thus ALL rings
and/or double bonds will be generated from the superatomparts.
III. It follows that every superatompart in the partition must
contain at least two atoms of valence two or higher plus at
least one unsaturation. If there are no unsaturations then no
rings could be built. In addition, an unsaturation cannot be
placed on a single atom. This rule defines the minimum number
of atoms and unsaturations in a superatompart.
IV. The maximum number of unsaturations in a superatompart is given
by Equation 2. Superatoms must possess at least one free
valence (12), so that superatomparts with no free valences,
e.g., O U or C U , are not allowed.
2 1 2 3
n
U = 1/2 (∃ (i-2)a ) (2)
max i=3 i
U = maximum unsaturation of a superatompart
max
i = valence
n = maximum valence in composition
a = number of atoms with valence i
i
V. The maximum number of superatomparts for a given formula is
defined by equation 3.
n
S = 1/2∃ a (3)
max i=2 i
n = maximum valence in composition
S = maximum number of superatomparts in a superatom partition
max
note: the summation is over all atoms of valence >2;
univalents are not considered.
Rules I-V define the allowed partitions of a group of atoms into
superatomparts. These rules do not, however, prevent generation of
equivalent partitions, which would eventually result in duplicate
structures. Defining a canonical ordering scheme to govern
partitioning prevents equivalent partitions. One such canonical
ordering is given in Appendix C.
The application of rules I-V is best illustrated through reference to
8
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
the example of C U . The maximum number of superatomparts for this
6 3
example is three (Equation 3). There is one way to partition C U
6 3
into one superatompart with no remaining atoms, partition 1, Table
II. Subsequent assignment of carbon atoms to the category of
remaining atoms results in partitions 2-4, Table II. The next
partition following the sequence 1-4 would be C U with C assigned
2 3 4
to remaining atoms. This partition is forbidden as C U has no free
2 3
valences. The three ways to partition C U into two superatomparts
6 3
are indicated along with the corresponding partitions following
assignment of atoms to remaining atoms, as partitions 5-10, Table II.
Note that the next logical steps are partitions C U /C U , C U /C U ,
3 1 3 2 2 2 4 1
and so forth, which are not allowed by the canonical ordering rule
VIc, above. There is only one unique way of partitioning C U into
6 3
three superatomparts, partition 11, Table II.
9
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
--------------------------------------------------------------------
Table II
Allowed Partitions of C U Into Superatomparts and Remaining Atoms.
6 3
Superatom
Partition Number of Superatompart Number Remaining
Number Superatomparts 1 2 3 Atoms
1 1 C U - - -
6 3
2 1 C U - - C
5 3 1
3 1 C U - - C
4 3 2
4 1 C U - - C
3 3 3
5 2 C U C U - -
4 2 2 1
6 2 C U C U - C
3 2 2 1 1
7 2 C U C U - C
2 2 2 1 2
8 2 C U C U - -
4 1 2 2
9 2 C U C U - C
3 1 2 2 1
10 2 C U C U - -
3 2 3 1
11 3 C U C U C U -
2 1 2 1 2 1
PART B. Ring-superatom Construction.
Each partition (Table II) must now be treated in turn. The complete
set of ring-superatoms for each superatompart in a given partition
must be constructed. The major steps in the procedure are outlined
in Figure 2.
Valence List. The first step in part B is to strip the superatompart
_______ ____
of atom names, while retaining the valence of each atom. The numbers
of each type of atom are saved for later labelling of the skeleton.
10
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
A valence list may then be specified, giving in order the number of
bi-, tri- , tetra- and n-valent nodes which will be incorporated in
the superatom. Thus the superatompart C U is transformed into the
6 3
valence list 0 bivalents, 0 trivalents, 6 tetravalents (0, 0, 6), and
C U becomes (0, 0, 4) (Figure 2).
4 2
Calculation of Free Valence. From the valence list and the
___________ __ ____ _______
associated unsaturation count the number of free valences of each
superatompart is determined using equation 4.
n
FV = (2 +∃ (i-2)a )-2U (4)
i=3 i
U = unsaturation of superatompart
i = valence
n = maximum valence in composition
a = number of atoms with valence i
i
FV = free valence
Partitioning of Free Valence. The free valences are then partitioned
____________ __ ____ _______
among the nodes in the valence list in all possible, unique ways. As
in the earlier partitioning problem, there are some simple rules
which must be followed. Because ring-superatoms are two-connected
structures two valences of each atom of a superatompart must be used
to connect the atom to the ring-superatom. Thus no free valences can
be assigned to bivalent nodes in the valence list, a maximum of one
to each trivalent, a maximum of two to each tetravalent, and so
forth. The example (Fig. 2) is further simplified in that there are
only tetravalent nodes in the valence list. Inclusion of trivalent
nodes (e.g., nitrogen atoms) merely extends the number of possible
partitions. The free valences are partitioned among the tetravalent
nodes in all ways, as illustrated in Figure 2. It is important to
note that removal of atom names makes all n-valent (n=2 or 3 or ...)
nodes in the valence list equivalent at this stage. Thus the
partitions (of eight free valences among six tetravalent nodes)
222200, 222020, 222002, ......, 002222 are all equivalent. Only one
of these partitions is considered to avoid eventual duplication of
structures.
Degree List. Each partition of free valences alters the effective
______ ____
valence of the nodes in the original valence list with respect to the
ring-superatom. In the example, assignment of one or two free
valences to a tetravalent node transforms this node into a tri- or
bivalent node respectively. As the ring-superatom is constructed,
11
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
those tetravalent nodes which have been assigned, say, two free
valences, have then only two valences remaining for attachment to the
ring-superatom. These nodes are then of degree (18) two and may be
termed secondary nodes (18). Thus the partition of free valences
2,2,2,2,0,0 on six tetravalent nodes yields the degree list (4,0,2)
(Fig. 2) as four of the tetravalent nodes receive two free valences
each yielding four nodes of degree two (secondary) and leaving two
nodes of degree four (quaternary). The program keeps track of the
number of free valences assigned to all nodes for use in a subsequent
step.
Loops. As will be clarified in the subsequent discussion, there are
_____
several general types of ring-superatoms which cannot be constructed
from the vertex-graphs available in the CATALOG (described below).
These are all cases of multiple extended unsaturations either in the
form of double bonds or rings. Examples are the following:
1) bi-, tri-, ... n-cyclics with exocyclic double bonds;
2) some types of SPIRO ring systems;
3) allenes extended by additional double bonds, e.g.,
c=c=c=c .
The concept of a loop, consisting of a single unsaturation and at
least one bivalent node must be specified for these cases. Examples
of loops containing one, two and three bivalent nodes are shown in
Scheme 3. Note that the two remaining "ends" of the unsaturation
will yield a "looped structure" when attached to a node in a graph
(shown as X, Scheme 3).
---------------------
Scheme 3
bivalents = 1 2 3
---------------------
The method for specification of loops is discussed in Appendix C.
This procedure yields the reduced degree list which contains none of
the secondary nodes originally present in the degree list (see
Appendix C).
------------------
18) Use of degree in this report refers to the number of bonds other
than free valences, with double bonds being counted twice. A free
valence may or may not eventually be attached to a hydrogen atom in
the final structure.
12
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
Vertex-Graphs. The reduced degree lists are used to specify a set of
_____________
vertex-graphs for the eventual ring-superatoms. All two-connected
structures can be described by their vertex-graphs, which are, for
most structures, regular trivalent graphs. This concept has been
described in detail by Lederberg (11), who has also presented a
generation and classification scheme for such graphs. Given a set of
all vertex-graphs, the set of all ring-superatoms may be specified.
The vertex-graphs are maintained by the program in the CATALOG.
Catalog entries for regular trivalent graphs possessing two, four and
six nodes are presented in Table III. This list must be supplemented
by additional vertex-graphs to cover several special cases. Several
of these additional graphs, which are sufficient for generation of
all structures in the example, are also presented in Table III.
--------------------------------------------------------------------
Table III
Partial Listing of Vertex-Graphs in the CATALOG
Degree List of
Structure Name* Compatable Chemical Graphs Remarks
"Singlering k" (k,0,0...) A single ring composed of k
secondary nodes.
"Daisy" (k,0,..,0,1,0,...) A single 2n-degree node.
2A (k,2,0,0...) Regular trivalent graph of 2 nodes
4AA (k,4,0,0...) Regular trivalent graph of 4 nodes
4BB (k,4,0,0...) Regular trivalent graph of 4 nodes
6AAA (k,6,0,0...) Regular trivalent graph of 6 nodes
6ABB (k,6,0,0...) Regular trivalent graph of 6 nodes
6ACA (k,6,0,0...) Regular trivalent graph of 6 nodes
6BCB (k,6,0,0...) Regular trivalent graph of 6 nodes
6CCC (k,6,0,0...) Regular trivalent graph of 6 nodes
13
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
"T03" (k,0,3,0...) Graph composed of three quaternary
nodes
$3BCB (k,2,1,0...) Graph composed of two tertiary and
one quaternary node
??? "T22" (k,2,2,0...) Graph composed of two tertiary and
two quaternary nodes
??? "T22" (k,2,2,0...) Graph composed of two tertiary and
two quaternary nodes.
??? "T22" (k,2,2,0...) Graph composed of two tertiary and
two quaternary nodes
??? "T22" (k,2,2,0...) Graph composed of two tertiary and
two quaternary nodes
$5CECC (k,4,1,0...) Graph composed of four tertiary
and one quaternary nodes
$BCCB (k,4,1,0...) Graph composed of four tertiary
and one quaternary nodes
$B:5AECA (k,4,1,0...) Graph composed of four tertiary
and one quaternary nodes
$5ABCB (k,4,1,0...) Graph composed of four tertiary
and one quaternary nodes
$5ACDB (k,4,1,0...) Graph composed of four tertiary
and one quaternary nodes
e.g.(k,0,2,0,...) Two nodes of equal valence (>3)
Note that secondary nodes in the degree list are
ignored since vertex-graphs have only trivalent or higher nodes.
* The graphs not in quotes are named according to Lederberg (11).
14
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
------------------------------------------------------------------
With the reduced degree list of a superatompart, the program requests
the appropriate CATALOG entries. In the example (Fig. 2), the degree
list (4,0,2) specifies vertex-graphs containing two quaternary nodes.
The degree list (2,4,0) specifies regular trivalent graphs of four
nodes, of which there are two, 4AA and 4BB (Table T2). The exception
is when only secondary nodes are present in the degree list, when the
vertex-graph "Singlering" (Table III) is utilized.
Interlude. Up to this point in the method, the program has
_________
effectively decomposed the problem into a series of subproblems,
working down from the empirical formula through a set of partitions
and subpartitions to the set of possible vertex-graphs. Subsequent
steps now work back up, in a sense, as the vertex-graphs are expanded
in a series of steps to the final structures employing the results of
the previous partitions.
Labelling Edges of Vertex-Graphs with Special Secondary Nodes. The
_________ _____ __ _____________ ____ _______ _________ _____
first step in building up structures is to attach the secondary nodes
in the reduced degree list to the vertex-graphs. It will be recalled
that these secondary nodes are those that will have loops attached,
thus the term "special secondary nodes" is used. Because a graph
structure is involved, the procedure by which the possible
attachments of the nodes to the graph are specified may be viewed as
a "labelling" of the graph. This is the first of six such graph
labelling steps performed by the program. The six steps are listed
in Table IV. All steps involve the same labelling problem, that of
associating a set of n not necessarily distinct labels with a set of
n not necessarily distinct objects. This problem is solved by a
single algorithm, the "labelling algorithm", for each of the six
labelling steps (Table IV). This algorithm is described in some
detail in the subsequent publication (11). A description of the
vertex mathematics and proof of thoroughness and irredundancy will
appear separately (12).
15
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
--------------------------------------------------------------------
Table IV
The Six Graph Labelling Steps Performed by the Labelling Algorithm
Labelling Step Function
1 Label Edges of Vertex-Graphs with
Special Secondary Nodes.
2 Label Edges of Resulting Graphs with
Non-Looped Secondary Nodes.
3 Label Loops of Resulting Graphs with
Looped Secondary Nodes.
4 Label Nodes of Skeletons with Free
Valences.
5 Label Nodes of Skeletons with Atom Names.
6 Label Free Valences of Superatoms with
Radicals (see Appendix D).
It is important in the context of this discussion, however, to
discuss some aspects of the first labelling step to indicate how
equivalent labellings (which would eventually yield duplicate
structures) may be avoided prospectively. This discussion is valid
for each of the subsequent labelling steps. Equivalent labellings
are prospectively avoided by recognition of the symmetry properties
of, in the first labelling, the vertex-graph. These symmetry
properties are expressed in terms of the permutation group (see
Appendix A and refs. 13 and 14), in the first labelling, on the edges
of the vertex-graph. This permutation group, which defines the
equivalence of the edges, may be specified in the CATALOG or,
alternatively, calculated as needed by a separate part of the cyclic
structure generator. As subsequent labelling steps are executed, a
new permutation group (e.g., on the nodes for labelling step four) is
calculated, again as necessary. Thus, only labellings which result
in unique expansions of the structure are permitted. In the simple
example, (Fig. 2), the symmetries of the vertex-graphs and subsequent
skeletons can be discerned easily by eye. In the general case this
is not true, however. For example, all edges of the vertex-graph
containing two tetravalent nodes are equivalent, as are the edges of
the regular trivalent graphs 2A and also 4BB. The $3BCB graph (Table
II, Fig. 2) has four equivalent edges and one other edge, and so
16
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
forth. In the general case, however, the symmetries of the vertex-
graphs and subsequent expansions thereof are not always obvious.
With distinguishability specified by the group on the edges,
labelling of the vertex-graphs with special secondary nodes is
carried out. The results of this procedure for partitions containing
loops are indeicated in Figure 2.
Partition of and Labelling With Non-Loop Secondary Nodes. The
_________ __ ___ _________ ____ ________ _________ _____
secondary nodes which were not assigned to loops ("non-looped
secondary nodes") are partitioned among the edges of the graphs which
arise from the previous labelling step. As in previous
partitionings, the edges are initially assumed to be
indistinguishable. In addition, loops are not counted as edges.
There are, for example, five ways to partition four non-loop
secondary nodes among the edges of the vertex-graph possessing two
quaternary nodes (Fig. 2).
Labelling of the graphs with non-loop secondary nodes is then carried
out. Each of the five partitions mentioned immediately above results
in a single labelling, as all edges of the graph are equivalent.
When edges are distinguishable there may be several ways to label a
graph with a single partition. There are, for example, for the $3BCB
graph, two ways to label with the partition 3,0,0,0,0, four ways with
the partition 2,1,0,0,0 and three ways with the partition 1,1,1,0,0
(Fig. 2).
Partitioning of and Labelling with Loop Seconday Nodes. There remain
____________ __ ___ _________ ____ ____ ________ _____
unassigned to the graphs at this point only secondary nodes which
were assigned to loops. These nodes are first partitioned among the
loops assuming indistinguish-ability and remembering that each loop
must receive at least one secondary node. For example, following the
path from the degree list (4,0,2) through the specification of two
loops (Fig. 2), there are two ways of labelling the two equivalent
loops with four secondary nodes. There is one way to label the two
loops of the adjacent graph with three secondary nodes and one way of
labelling the two loops of each of the two remaining graphs in this
section of Figure 2 with two secondary nodes. In this example
(*C6U3) the loops in every case are equivalent or there is only one
loop to be labelled. In the general case loops may not be
equivalent, resulting in a greater number of ways to label loops with
a given partition of secondary nodes.
Cyclic Skeletons. The previous labelling steps specified the number
______ _________
of secondary nodes on each edge of and loop attached to the vertex-
graphs. All atoms in the original superatompart are thus accounted
for. A representation of the result is the cyclic skeleton, where
17
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
nodes and their connections to one another are specified. These
skeletons begin to resemble the final structures.
Free Valence Labelling. Each of the nodes in an cyclic skeleton is
____ _______ _________
then labelled with free valences. This labelling is trivial in the
example, as all atoms are of the same valence (four). This labelling
is illustrated in some cases in Figure 2, yielding the indicated
ciliated skeletons. This labelling is performed independent of the
IDENTITIES of the atoms, but with knowledge of how many atoms of each
valence type were present in the original superatompart.
A simple example illustrates the potential complexity of this
labelling problem when atoms of more than one valence type are
present. If there were, say, n nitrogen atoms in the superatompart,
in addition to m carbon atoms, then n nodes in the ciliated skeleton
must be trivalent, and m must be tetravalent, in all unique ways.
The free valence labelling would be done accordingly, yielding
secondary nodes with one or two free valences and tertiary nodes with
zero or one free valences. In the general case there may be several
ways to perform this labelling on a single cyclic skeleton, whereas
in the C U example there is only one way.
6 3
Atom Name Labelling. The nodes of a ciliated skeleton are then
____ ____ _________
labelled with atom names to yield the ring-superatom(s). Again this
labelling is trivial in the example, as only one type of atom is
present (carbon), yielding in each case only a single superatom (Fig.
2). If there is more than one type of atom with the same valence
(e.g., silicon and carbon), the labelling problem is more complex.
Each node of appropriate valence may be labelled with either type of
atom. Duplicate structures are avoided by calculating the group on
the set of nodes of equal valence and labelling accordingly.
PART C. Acyclic Generator.
The superatom partition expanded in the example had no atoms assigned
to acyclic chains (remaining atoms). The set of superatoms on
completion of Part B, above, thus yields the set of 36 structures on
placement of a hydrogen atom on each free valence (Fig. 2). If the
superatom partition (partitions 2-11, Table II) contained more than
one superatompart or remaining atoms, the acyclic generator must be
used to connect the segments of the structure in all ways. This
procedure is described in detail in Appendix D.
DISCUSSION
18
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
Completion_of_C_U_. The example (Fig. 2) has considered only
_______________ _
6 3
expansion of a single superatom partition. The total number of
isomers of C U and their structures considering all partitions has
6 3
been determined by the program to be 159. It may be instructive for
the reader to attempt to generate, by hand, structures from another
partition, following the method outlined above and in Figure 2. The
initial superatom partitions yield some indication of the types of
structures which will result from each partition. For example,
partition 4 (Table II), C U in a single superatompart, plus three
3 3
carbons in the remaining atoms, should yield all structures
containing a three-membered ring possessing two double bonds or a
triple bond. As there are only two free valences, the remaining
atoms can be in a single chain (as a propyl or iso-propyl radical) or
as a methyl and an ethyl group, but not as three methyl groups.
Completeness and Irredundancy. Although a mathematical proof of the
____________ ___ ____________
completeness and irredundancy of the method exists (14 15), there is
no guarantee that the implementation of the algorithm in a computer
program maintains these desired characteristics. Confidence in the
thoroughness and irredundancy of the algorithm can be engendered in
the following ways:
1) Duplication of the program's performance by another,
completely independent approach. If such an approach exists (to our
knowledge it does not), it must be proven mathematically. An
independent algorithm has been developed which is capable of
enumeration, but not construction, of all isomers of a composition
containing C,H,N, and O. Implementation of that algorithm is limited
at the present time, unfortunately, to compositions containing only 5
or fewer carbon atoms and various numbers of hydrogen atoms. The
limitation is due to computer memory constraints. For these cases,
however, the numbers of isomers obtained by both programs agree.
2) Testing by manual generation of structures. Several
chemists, all without knowledge of the algorithm described above,
have been given several test cases, including C U , from which
6 3
structures were generated by hand. Familiarity with chemistry is no
guarantee of success, as evidenced by the performance of three
chemists for the superficially simple case of C U (C H , Table V).
6 3 6 8
This example indicates that for more than very trivial cases, it is
extremely difficult to avoid duplicates (tricyclics, for example, are
difficult to visualize when testing for duplicates) and omissions.
19
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
Omissions appear to result from both being careless and forgetting
ring systems that are implausible or unfamiliar. The program seems
better at testing the chemist than vice versa. In every instance of
manual structure generation, no one has been able to construct a
legal structure that the program failed to construct. No one has
been able to detect an instance of duplication by the program. This
performance builds some confidence, but manual verification of more
complicated cases is extremely tedious and difficult. Isomers for
many empirical formulae have been generated, and some results are
tabulated in Table VI. The choice of examples has been motivated by
a desire to test all parts of the program where errors may exist
while keeping the number of isomers small enough to allow
verification. In this manner all obvious sources of error have been
checked, for example, construction of loops on loops, multiple types
of atoms of the same valence (e.g. Cl, Br, I) and partitions
containing atoms of several different valences including penta- and
hexavalent atoms.
3) Varying the order of generation. The structure of the
program permits additional tests by doing some operations in a
different order. For example, one variation allowed is to leave
hydrogens associated with the atoms in each partition rather than to
strip them away initially and place them on the remaining free
valences in the last step. Each such test has resulted in the same
set of isomers.
In summary, the verification procedures utilized have all indicated
absence of errors in the computer implementation of the algorithm.
Also, there is no clear reason why generation of larger sets of
isomers should not also proceed correctly. The final verdict
however, must await development of new mathematical tools for
verification by enumeration (see above) or an alternative algorithm.
20
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
--------------------------------------------------------------------
Table V
*
Performance of Three Chemists in Manual Generation
of Isomers of C H (C U ). There are 159 Isomers|
6 8 6 3
Number Generated Type of Error
Chemist 1 161 4 duplicates; 4 omissions;
2 with 7 carbon atoms.
Chemist 2 168 16 duplicates; 7 omissions
Chemist 3 160 2 duplicates; 1 omission
* One PhD and two graduate students.
-----------------
21
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
--------------------------------------------------------------------
Table VI
The Number of Isomers for Several Empirical Formulae
Empirical Example Number of Isomers Manually Verified?
Formula Compound
C H benzene 217 yes
6 6
C H 1,3-cyclohexadiene 159 yes
6 8
C H cyclohexene 77 yes
6 10
C H cyclohexane 25 yes
6 12
C H hexane 5 yes
6 14
C H O phenol 2237
6 6
C H O cyclohexanone 747 no
6 10
C H O 2-hexanone 211 yes
6 12
C H N pyrazole 155 no
3 4 2
C H N 2-pyrazoline 136 yes
3 6 2
C H N tetrahydropyrazole 62 no
3 8 2
C H N propylenediamine 14 yes
3 10 2
C H P (pentavalent P) 110 no
4 9 1
-----------------
22
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
Constraints. The structure generator is designed to produce a list
___________
of all possible connectivity isomers (Appendix B). This list
contains many structures whose existence seems unlikely based on
present chemical knowledge. In addition, the program may be called
on to generate possible structures for an unknown in the presence of
a body of data on the unknown which specify various features, (e.g.,
functional groups) of the molecule. In such instances mechanisms are
required for constraining the generator to produce only structures
conforming to specified rules. The implementation of the acyclic
generator possessed such a mechanism in the form of GOODLIST (desired
features) and BADLIST (unwanted features) (3) which could be utilized
during the course of structure generation.
The cyclic generator is less tractable. As in prospective avoidance
of duplicate structures, it is important that unwanted structures, or
portions thereof, be filtered out as early in the generation process
as possible. It is relatively easy to specify certain general types
of constraints in chemical terms, for example, the number of each of
various types of rings or ring systems in the final structure, ring
fusions, functional groups, sub-structures and so forth. It is not
always so easy to devise an efficient scheme for utilizing a
constraint in the algorithm, however. As seen in the above example
(Fig. 2) the expanded superatom partition results in what would be
viewed by the chemist as several very different ring systems.
The design of the program facilitates some types of constraints. For
example, the program may be entered at the level of combining
superatoms to generate structures from a set of known sub-structures.
If additional atoms are present in an unknown configuration, they can
be treated as a separate generation problem, the results of which are
finally combined in all ways with the known superatoms. This
approach will not form additional two-connected structures, however.
Constraints which disallow an entire partition may be easily
included. For example, it is possible to generate only open chain
isomers by "turning off" the appropriate initial superatom
partitions.
Much additional work remains, however, before a reasonably complete
set of constraints can be included. The implementation of each type
of constraint must be examined and tested in detail to ensure that
the generator remains thorough and irredundant.
CONCLUSIONS
The boundaries, scope and limitations of chemical structure can
now be specified.
ACKNOWLEDGEMENTS
23
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
Appendix A. Equivalence Classes and Finite Permutation Groups.
Two members of a set of possible isomers may be defined to be
equivalent if a specified transformation of one member causes it to
be superimposable upon another member of the set. For example, there
are fifteen possible ways of attaching two chlorine and four hydrogen
atoms to a benzene ring (Scheme 4).
-----------------------------------------------------------------
Scheme 4
-----------------------------------------------------------------
If rotations by multiples of 60 degrees are specified as allowed
transformations, the fifteen structures fall logically into three
classes, termed "equivalence classes" (Scheme 4). Within each
equivalence class structures may be made superimposable by the
rotational transformation. If one element (in this case a molecular
structure) is chosen from each equivalence class, the complete set of
possible structures is determined, without duplication. It is the
task of the labelling algorithm to produce one and only one graph
labelling corresponding to one member of each equivalence class.
The set of transformations which define an equivalence class is
termed a "finite permutation group." This permutation group may be
calculated based on the symmetry properties of a graph (or chemical
structure in the example of Scheme 4). This calculation provides the
mechanism for prospective avoidance of duplication. These procedures
are described more fully in the accompanying paper (13).
24
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
Appendix B. Isomerism and Symmetry.
Appendix A introduced the concept of equivalence classes and finite
permutation groups. The selection of transformation (Appendix A)
directs the calculation of the permutation group and thus defines the
equivalence classes. Different types of transformation may be
allowed depending on the symmetry properties of the class of isomers
considered. This Appendix discusses several of the possible types of
isomerism, most of which are familiar to chemists. The reader
seeking a more thorough discussion of some types of isomerism
discussed below is referred to an exposition of molecular symmetry in
the context of chemistry and mathematics (19).
Isomers are most often defined as chemical structures possessing the
same empirical formula. Different concepts of symmetry give rise to
different classes of isomers, some of which are described below.
Permutational Isomers. Permutational isomers are isomers which have
_____________ _______
in common the same skeleton and set of ligands. They differ in the
distribution of ligands about the skeleton. Gillespie et al. (20)
and Klemperer (21) have used the concept of permutational isomers to
probe into unimolecular rearrangement or isomerization reactions.
Stereoisomers. Ugi et al. (19) have defined the "chemical
_____________
constitution" of an atom to be its bonds and bonded neighbors. Those
permutational isomers which differ only by permutations of ligands at
constitutionally equivalent positions form the class of
stereoisomers.
Isomers Under Rigid Molecular Symmetry. If one conceives of
_______ _____ _____ _________ ________
molecular structures as having rigid skeletons, the physical
rotational (three dimensional) symmetries and transformations may be
readily defined. Each transformation causes each atom (and bond) to
occupy the position of another or same atom (and bond) so that the
rotated structure can physically occupy its former position and at
the same time be indistinguishable from it in any way. This is the
most familiar form of symmetry. Under this type of symmetry
conformers are distinguishable and belong in distinct equivalence
------------------
19) I. Ugi, D. Marquarding, H. Klusacek, G. Gokel, and P. Gillespie,
Angew. Chem. internat. Edit., 9, 703 (1970).
20) P. Gillespie, P. Hoffman, H. Klusacek, D. Marquarding, S.
Pfohl, F. Ramirez, E. A. Tsolis, and I. Ugi, Angew. Chem. internat.
Edit., 10, 687 (1971).
21) a) W. G. Klemperer, J. Amer. Chem. Soc., 94, 6940 (1972);
b) W. G. Klemperer, ibid, p. 8360.
25
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
classes. Every transformation is orthogonal and preserves bond
angles and bond lengths as well as maintaining true chirality.
If one allows other orthogonal transformations that alter chiral
properties of structures, equivalence classes result that treat both
the left-handed and right-handed forms of chiral molecules to be the
"same". Thus a "mirror image" transformation when suitably defined
permits the left-handed form to exactly superimpose the right-handed
form and vice-versa.
Isomers Under Total Molecular Symmetry. If in addition to the above
_______ _____ _____ _________ ________
mentioned rigid molecular transformations one recognizes the
flexional movements of a nonrigid skeleton, a dynamic symmetry group
may be defined. Under this definition, different conformers now are
grouped together. Thus the "chair" and "boat" conformations of
cyclohexane belong to the same equivalence class under dynamic
symmetry. The permutation group of skeletal flexibility is
computable separately and independently of rigid molecular symmetry.
One can then view total molecular symmetry as the product of the two
finite permutation groups.
Isomers Under Connectivity Symmetry. The concept of connectivity
_______ _____ ____________ ________
symmetry was introduced previously (METHOD section). Every
permutation of atoms and bonds onto themselves is a symmetry
transformation for connectivity symmetry if,
a) each atom is mapped into another of like species, e.g., N to
N, C to C, O to O, and
b) for every pair of atoms, the connectivity (none, single,
double , triple, ...) is preserved in the mapping, i.e. the the
connectivity of the two atoms is identical to the connectivity
of the atoms they are mapped into.
One can readily recognize that transformations as defined
automatically preserve the valence and bond distribution of every
atom. It is very probable that readers accustomed to three
dimensional rotational and reflectional symmetries will tend to
equate them with the symmetries of connectivity. It is emphasized
again that connectivity symmetry does not consider bond lengths or
bond angles, and it includes certain transformations that are
conceivable but have no physical interpretation save that of
permuting the atoms and bonds.
26
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
Appendix C. Calculation of Loops.
There are several rules which must be followed in consideration of
loop assignment to ring-superatoms. The minimum (MINLOOPS) and
maximum (MAXLOOPS) numbers of loops for a given valence list are
designated by equations 5 and 6.
n
MINLOOPS = max{0,a +1/2(2j -∃ ja )} (5)
2 max i=2 j
n
MAXLOOPS = min{a ,∃ ((i-2)/2)a } (6)
2 j=4 j
MINLOOPS = minimum number of loops
MAXLOOPS = maximum number of loops
a = number of secondary nodes in degree list
2
j = degree of highest degree item in degree list
max
j = degree
n = highest degree in list
a = number of nodes with degree j.
j
The form of the equations results from the following considerations:
1) Only secondary nodes may be assigned to loops. Nodes of
higher degree will always be in the non-loop portion of the
ring-superatom.
2) A loop, by definition, must be attached by two bonds to a
single node in the resulting ring-superatom. The loop cannot
be attached through the free valences. Thus the degree list
must possess a sufficient number of quaternary or higher degree
nodes to support the loops(s).
3) Each loop must have at least one secondary node, which is
the reason MAXLOOPS is restricted to at most the maximum number
of secondary nodes in the degree list (Equation 6).
4) There must be available one unsaturation for each loop
(this is implicit in the calculation of MINLOOPS and MAXLOOPS)
as each loop effectively forms a new ring.
Partitioning of Secondary Nodes. For each of the possible numbers of
____________ __ _________ _____
27
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
loops (0,1, ...) the secondary nodes are removed from the degree list
and partitioned among the loops, remembering that the loops are at
present indistinguishable and each loop must receive at least one
secondary node. In the example (Fig. 2), starting with the degree
list (4,0,2), there are three ways of partitioning the four secondary
nodes among two loops and the remaining non-looped portion. Removal
of the four secondary nodes from the degree list and assignment of
two, three or four of them to two loops results in the list specified
in Figure 2 as the "reduced degree list". Specification of two loops
transforms the two quaternary nodes in the degree list into two
secondary nodes. This results from the fact that two valences of a
quaternary or higher degree node must be used to support each loop.
These are "special" secondary nodes, however, as these particular
nodes are the ones which will be connected to loops as the structure
is built up. Thus, in the example, any secondary nodes which are
found in the reduced degree list will have a loop attached in a
subsequent step. The degree list (4,0,2) thus becomes the reduced
degree list (2,0,0) in the partition specifying two loops (Fig. 2).
Similarly, the partition of one loop for the degree list (3,2,1)
results in a reduced degree list of (1,2,0) with the three original
secondary nodes partitioned among loop and non-loop portions (Figure
2).
If, after the first, second, ... nth loop partition, there remain one
or more quaternary or higher degree nodes in the reduced degree list,
the list must be tested again for the possibility of additional
loops. Each loop partition will generate a new set of structures.
The second pass will yield those structures possessing loops on
loops, and so forth. One such superatom which would be generated in
this manner from a composition of (at least) C U is 15.
6 5
c=c=c=c=c=c
The partition of (4,0,2) including one loop results in each case in a
reduced degree list (1,0,1). This list is disallowed in the
subsequent step, as the vertex-graph for one quaternary node is a
daisy (Table II), which requires a minimum of two secondary nodes
with which to label the daisy loops (a minimum of one secondary node
in the reduced degree list for each loop of the daisy).
Appendix D
A method of construction of structures similar to the generation of
acyclic molecules is utilized to join multiple ring-superatoms and
remaining atoms. The DENDRAL algorithm for construction of acyclic
molecules (3,24) relied on the existence of a unique central atom (or
24) A more complete description of the algorithm is available; see
29
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
B. G. Buchanan, A. M. Duffield, and A. V. Robertson, in "Mass
spectrometry, Techniques and Applications," G. W. A. Milne, ed.,
John Wiley and Sons, Inc., 1971, p. 121.
bond) to every molecule. The present acyclic generator uses the same
idea. The present algorithm, though simpler in not having to to
treat interconnection of atoms or ring-superatoms through multiple
bonds, is more complex because of the necessity to deal with the
symmetries of the ring-superatoms.
D1. Method for the case with even number of total atoms.
The superatom partition C U /C U /-/C (partition 7, Table II and
2 2 2 1 2
Figure 2) will be used here to illustrate this procedure. The
superatomparts C U and C U have exactly one possible ring-superatom
2 2 2 1
for each (see Table VII).
--------------------------------------------------------------------
Table VII
Superatompart Superatom
C U -C≡≡C-
2 2
C U >C==C<
2 1
----------------------------------------
Thus acyclic structures are to be built with -C≡≡C- , >C==C< and two
C's.
There are an even number atoms and ring-superatoms. The structures
to be generated fall into two categories:(a) those with a central
bond; (b) those with a central atom.
Category A. CENTRAL BOND (see Fig. 3).
Step 1. Partition into Two Parts.
The atoms and ring-superatoms are partitioned into two parts, with
each part having exactly half the total number of items. Each atom
or ring-superatom is a single item. Each part has to satisfy
equation 7, called the Restriction on Univalents.
Restriction on Univalents:
30
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
------------------
n
1+∃ (i-2)a ≥ 0 (7)
i=1 i
i = valence
a = number of atoms or superatoms of valence i
i
n = maximum valence in composition
There are two ways of partitioning the four items into two parts
(Fig. 3). The restriction on univalents is satisfied in each case.
The restriction will disallow certain partitions that have "too many"
univalents other than hydrogens and therefore is essential only in
partitioning compositions that contain any number of non-hydrogen
univalents.
Step 2. Generate Radicals from Each Part.
Using a procedure described in Section C3, radicals are constructed
from each part in each partition. Table VIII shows the result of
applying this procedure to the example.
31
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
---------------------+------------------------------
>C==C< , C -> -CH==CH-CH3
-C-CH3
"
CH2
-CH2-CH==CH2
====================================================
Step 3. Form Molecules From Radicals.
The radicals are combined in unique pairs, within each initial
partition. Each pair gives rise to a unique molecule, for each of
which is the center is a bond. There are nine such molecules that
have a central bond, for the example chosen (Fig. 3).
Category B. CENTRAL ATOM (see Fig. 4).
--------------------------------------------------------------------
Table VIII
---------------------+------------------------------
Part | Radicals
---------------------+------------------------------
-C≡≡C- , >C==C< -> -C≡≡C-CH=CH2
-CH=CH-C≡≡CH
-C-C≡CH
"
CH2
---------------------+------------------------------
C2 -> -CH2-CH3
---------------------+------------------------------
-C≡≡C- , C -> -C≡≡C-CH3
-CH2-C≡≡CH
32
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
Step 1. Selection of Central Atom.
One must consider every unique atom or ring-superatom that has a free
valence of three or higher as a central atom. In the example, of
three candidates available: -C≡≡C- , >C==C< and C, the first is not
chosen for it has a free valence of only two.
Step 2. Partition the Rest of the Atoms.
The atom or ring-superatom chosen for the center is removed from the
set and the rest are partitioned into a number of parts less than or
equal to the valence of the central atom. Each part must have less
than half the total number of items being partitioned (again a ring-
superatom is a single item). Each part must satisfy the restriction
on univalents (equation 7).
Thus, for the case where a carbon is the center, from one to four
partitions are generated with the condition that each part has at
most (4-1)/2 or 1 atom. No valid partitions are possible for one,
two or four parts. There is exactly one partition for three parts,
i.e., one atom in each. The partitions are shown in Figure 4.
Step 3. Generate Radicals.
Once again, using the procedure described in Section C3, radicals are
generated for each part in each partition. For example, the
partition -C≡≡C- gives rise to exactly one possible radical -C≡≡CH
(Fig. 4).
Step 4. Combine Radicals.
Although in the example shown every part generates only one radical,
in the general case there will be many radicals for each part. If
so, the radicals must be combined to give all unique combinations of
radicals within each partition.
Step 5. Form Molecules from Central Atom and Radicals.
If the central atom is not a ring-superatom but is a simple atom,
then each combination of radicals derived in Step 4 defines a single
molecule that is unique. Thus for example when C is chosen as the
center, step 4 gives one combination of radicals which determines a
single molecule when connected to the central C (see Figure 4).
If the central atom is a ring-superatom and the valences of the ring-
superatom are not identical then different ways of distributing the
radicals around the center may yield different molecules. Labelling
33
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
of the free valences of the central ring-superatom with radicals
treated as labels (supplemented with adequate number of hydrogens to
make up the total free valence of the ring-superatom) generates a
complete and irredundant list of molecules. Thus >C==C< is labelled
with the label set:
one of -C≡≡CH , two of -CH3 , and one of -H.
There are two unique labellings as shown in Figure 4.
D2. Method for odd number of total atoms.
With an odd number of total atoms, no structures can be generated
with a central bond. Only the case of a central atom is to be
considered. However, it is possible for structures to be built with
a bivalent atom at the center. Thus the procedure outlined in
Category B above is followed, in this case also allowing a bivalent
atom to be the central atom.
D3. generation of radicals
The goal of this procedure is to generate all radicals from a list of
atoms and ring-superatoms. A radical is defined to be an atom or
superatom with a single free valence. When a composition of atoms
and ring-superatoms is presented, from which radicals are to be
generated, two special cases are recognized.
Case 1. Onle One Atom in Composition.
When only one atom which is not a ring-superatom makes up the
composition only one radical is possible. As for example, with one
C, the radical -CH3 is the only possibility.
Case 2. Only One Item (a Ring-superatom) in Composition.
In this case, depending upon the symmetry of the ring-superatom,
several radicals are possible. This is determined by labelling the
free valences of the ring-superatom with one label of a special type,
a "radical-valence".
Example: A composition consists of one ring-superatom, 16.
C-
/"
>C< "
\"
C-
16
34
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
Two radicals result from labelling with one radical valence.
CH C-
/" /"
-CH< " CH2< "
\" \"
CH CH
17 18
General Case. Radicals have uniquely defined centers as well, the
_______ ____
center always being an atom of valence two or higher. The steps for
generation of radicals are as follows.
Step 1. Selection of Central Atom.
Any bivalent or higher valent atom or ring-superatom is a valid
candidate to be the center of a radical. Thus, for example, for the
composition -C≡C-, >C=C< (see part 1 in Figure 3) both are valid
central atoms (Figure 5).
Step 2. Partition the Rest of the Atoms.
The atom chosen for the central atom is removed from the composition.
One of the valences of the central atom is to remain free.
Therefore, the rest of the atoms in the composition are partitioned
into less than or equal to (valence of central atom - 1) parts. Of
course, each part should satisfy the restriction on univalents
(equation 7) but for constructing radicals there is no restriction on
the size of the parts.
Step 3. Form Radicals from Each Part.
The procedure to generate radicals is freshly invoked on each part
thus generating radicals. Each part in Figure 5 gives rise to only
one radical each arising from special case 2.
Step 4. Combine Radicals in Each Part.
For the example in Figure 5, each part yields only one radical. In a
more general situation, where the rest of the composition after
selection of center, is partitioned into several parts, and where
each part yields several radicals, the radicals are combined to
determine all unique combinations of radicals.
Step 5. Label Central Atom with Radicals. If the center is an atom
35
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
(not a ring-superatom) then each unique combination defines a single
unique radical.
If the center is a ring-superatom, the radicals are determined by
labelling the center with a set of labels which includes;I) the
radicals; ii) a leading radical-valence; iii) an adequate number of
hydrogens to make up the remaining free valences of the ring-
superatom. One selection of center gives one radical and the other
gives two more, to complete a list of three radicals for the example
chosen (Fig. 5).
Summary
For the example chosen to illustrate the operation of the
acyclic generator, twelve isomers are generated, nine shown in Figure
3 and three shown in Figure 4.
a. Partition in order of increasing number of superatomparts.
b. For each entry in each part of (a), partition in order of
decreasing size of superatompart by allocation of atoms one at
a time to remaining atoms.
c. Each individual partition containing two or more
superatomparts must be in order of equal or decreasing size of
the superatompart. In other words, the number of atoms and
unsaturations in superatompart n+1 must be equal to or less
than the number in superatompart n. The program notes the
equality of superatomparts in a partition to avoid repetition.
36