perm filename BIGMES.PUB[1,LMM]1 blob
sn#030257 filedate 1973-03-20 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00042 PAGES
C REC PAGE DESCRIPTION
C00001 00001 VALID 00031 PAGES
C00006 00002 .DEVICE TTY
C00009 00003 Problems of structural isomerism in chemistry have received much
C00011 00004 Acyclic molecules represent only a subset of molecular structures,
C00014 00005 Central to the successful solution of this problem is the solution
C00018 00006 .TITL METHOD
C00022 00007 ∪Superatom. A chemical graph is separable into cyclic and acyclic
C00025 00008 ∪Underlying ∪Graph. Underlying graphs (11) possess only tertiary or
C00028 00009 ∪Multiple ∪Bonds. All multiple bonds (double, triple, ...) are
C00031 00010 ∪Strategy. The strategy behind the cyclic structure generator is
C00034 00011 The example chosen to illustrate each step of the
C00037 00012 .BEGIN TABLE I,Partitioning Steps Performed by the Cyclic Structure Generator
C00039 00013 PART A. Superatom Partitions.
C00042 00014 If the unsaturation count is zero, the formula is passed immediately
C00046 00015 .BEGIN INDENT 0,0
C00050 00016 .BEGIN TABLE II,Allowed Partitions of C↓6U↓3 Into Superatomparts and Remaining Atoms.
C00052 00017 PART B. Ring-superatom Construction.
C00056 00018 ↓_Degree List_↓. Each partition of free valences alters the effective
C00059 00019 ∪Loops. As will be clarified in the subsequent discussion, there are
C00063 00020 .BEGIN TABLE III,Partial Listing of Underlying Graphs in the CATALOG
C00068 00021 With the reduced degree list of a superatompart, the program
C00072 00022 .BEGIN TABLE IV,The Six Graph Labelling Steps Performed by the Labelling Algorithm
C00077 00023 .<< there is something essentially different from the program (but not >>
C00082 00024 ↓_Unciliated Skeletons_↓. The previous labelling steps specified the
C00085 00025 ↓_Atom Name Labelling_↓. The nodes of a ciliated skeleton are then
C00089 00026 ↓_Completeness and Irredundancy_↓. Although a mathematical proof of the
C00095 00027 In summary, the verification procedures utilized have all
C00099 00028 ∪Constraints. The structure generator is designed to produce a list of
C00103 00029 .HD CONCLUSIONS
C00105 00030 Appendix A. Equivalence Classes and Finite Permutation Groups.
C00108 00031 Appendix B. Isomerism and Symmetry.
C00113 00032 ↓_Isomers Under Total Molecular Symmetry_↓. If in addition to the above
C00117 00033 Appendix C. Calculation of Loops.
C00120 00034 ↓_Partitioning of Secondary Nodes_↓. For each of the possible numbers of
C00125 00035 Appendix D
C00127 00036 C1. METHOD FOR THE CASE WITH EVEN NUMBER OF TOTAL ATOMS.
C00130 00037 There are two ways of partitioning the four items into two
C00133 00038 Step 3. Form Molecules From Radicals.
C00136 00039 Step 3. Generate Radicals.
C00139 00040 C2. METHOD FOR ODD NUMBER OF TOTAL ATOMS
C00143 00041 ∪General ∪Case. Radicals have uniquely defined centers as well, the
C00146 00042 Step 5. Label Central Atom with Radicals.
C00148 ENDMK
C⊗;
.DEVICE TTY
.FOOTSEP←"------------------";
.BEGIN CENTER
APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI
Exhaustive Generation of Cyclic and Acyclic Isomers.(1,2)
.END
.GROUP SKIP 5
.BEGIN INDENT 6,6,6
ABSTRACT. A systematic method of identification of all possible
molecular structures (isomers) fitting a given empirical formula is
described. The method, embodied in a computer program, generates a
complete list of isomers. Duplicate structures are avoided prospectively.
.END
.GROUP SKIP 20
.BEGIN VERBATIM
------------------
.END
1) For Part X see D.H. Smith, B.G. Buchanan, W.C. White, E.A.
Feigenbaum, J. Lederberg, and C. Djerassi, Tetrahedron, submitted.
2) Financial support for this work was provided by the National
Institutes of Health (RR00612-03) and the Advanced Research
Projects Agency (SD-183).
.EVERY FOOTING(,{PAGE},)
.MACRO LMCOMMENT (REMARK) ⊂⊃
.MACRO HD (HEADING) ⊂ BEGIN TURNON "{";SKIP 1;NOFILL;BREAK; }HEADING{END;⊃
.TURNON "∪↑↓[]&→_"
.EVERY HEADING(,APPLICATIONS OF ARTIFICIAL INTELLIGENCE FOR CHEMICAL INFERENCE - XI,)
.SKIP TO COLUMN 1
.MACRO TITL (TITLE) ⊂ BEGIN TURNON "{"; SKIP 2; NOFILL; BREAK;CENTER;}TITLE{END;⊃
.MACRO TABLE (N,TITLE) ⊂ NOFILL;GROUP;BEGIN CENTER;TURNON "{"; SKIP 1;}
--------------------------------------------------------------------
Table N
TITLE{END;⊃
Problems of structural isomerism in chemistry have received much
attention. But only occasional inroads have been made toward
a systematic solution of the underlying mathematical problems of
isomerism. Recently the "boundaries, scope and limits" (3) of the
.SEND FOOT ⊂
3) J. Lederberg, G.L. Sutherland, B.G. Buchanan, E.A. Feigenbaum,
A.V. Robertson, A.M. Duffield, and C. Djerassi, J. Amer. Chem.
Soc., 91, 2973 (1969).
.⊃;
subject of structural isomerism of acyclic molecules have been defined
by the DENDRAL algorithm (3). This algorithm permits an enumeration
and representation of all possible acyclic molecular structures with
a given molecular formula.
Acyclic molecules represent only a subset of molecular structures,
however, and it may be argued that cyclic structures (including those
posessing acyclic chains) are of more general interest and importance
to modern chemistry. An approach to cyclic structure
generation has appeared in a previous paper in this series (4). That
.SEND FOOT ⊂
4) Y.M. Sheikh, A. Buchs, A.B. Delfino, G. Schroll, A.M. Duffield,
C. Djerassi, B.G. Buchanan, G.L. Sutherland, E.A. Feigenbaum, and
J. Lederberg, Org. Mass Spectrom., 4, 493 (1970).
.⊃;
approach, which manipulates acyclic
isomers by removing hydrogen atoms pairwise and replacing them with a
new bond, has one serious drawback. Because the approach cannot make
efficient use of the symmetry properties of cyclic graphs, an
extensive amount of computer time (prohibitive for other than very
simple cases) must be spent in retrospective checking of each
candidate structure with existing structures to remove duplicates.
For this reason, an alternative approach to construction of cyclic
molecules has been developed. This approach is designed to take
advantage of the underlying graph theoretic considerations, primarily
symmetry, to arrive at a method for more efficient construction of a
complete and irredundant list of isomers for a given empirical
formula.
Central to the successful solution of this problem is the solution
of the subproblem of specification of all isomers having a specified
common ring system. This topic has received attention for nearly 100
years, with limited success (5). Its more general ramifications go far
.SEND FOOT ⊂
5) See, for example, A.C. Lunn and J.K. Senior, J. Phys. Chem., 33,
1027 (1929) and references cited therein.
.⊃;
beyond organic chemistry. Graph theoreticians have considered various
aspects of this topic, frequently, but not necessarily, in the context
of organic molecules. Polya has presented a theorem (6) which
.SEND FOOT ⊂
.BEGIN NOFILL
6) a) G. Polya, Compt. rend., 201, 1167 (1935);
b) G. Polya, Helv. Chim. Acta, 19, 22 (1936);
c) G. Polya, Z. Kryst. 92, 415 (1936);
d) G. Polya, Acta Math., 68, 145 (1937).
.END
.⊃;
permits calculation of the number of structural isomers for a given
ring system. Hill (7) has applied this theorem to enumeration of
.SEND FOOT ⊂
.BEGIN NOFILL
7) a) T.L. Hill, J. Phys. Chem., 47, 253 (1943);
b) T.L. Hill, ibid., p. 413.
.END
.⊃;
isomers of simple ring compounds and Hill (8) and Taylor (9) have
.SEND FOOT ⊂
8) T.L. Hill, J. Chem. Phys., 11, 294 (1943).
9) W.J. Taylor, ibid., p. 532.
.⊃;
pointed out that Polya's theorem permits enumeration of geometrical
and optical isomers in addition to structural isomers. More recently,
formulae for enumeration of isomers of monocyclic aromatic compounds
based on graph theory, permutation groups and Polya's theorem have
been presented (10). This history of interest and results provides
.SEND FOOT ⊂
10) A.T. Balaban and F. Harary, Rev. Rumaine de Chimie, 12, 1511
(1967).
.⊃;
only marginal benefit to the organic chemist. Although the number of
isomers may be interesting, these methods (5-9) do not indicate
the structure of each isomer. Also, these methods do not provide
information on the more general case where the ring system is embedded
in a more complex structure. For even simple cases the task of
specifying each structure by hand, without duplication, is an onerous
one.
.TITL METHOD
.HD OVERVIEW
∪Framework. The framework for this method is that chemical structures
consist of some combination of acyclic chains and rings or ring
systems (4). The problem of construction of acyclic isomers has been
solved previously (3). If, therefore, all possible ring systems can
.LMCOMMENT this still does not solve the problem of embeding the rings;
.LMCOMMENT that is still a labelling problem;
be generated from all or part of the atoms in the empirical formula,
the rings can be linked together or be attached to acyclic radicals
using the acyclic generator. This yields the complete list of
isomers. The method for construction of ring systems is described
below. This description employs some terms which require definition.
The definitions also serve to illustrate the taxonomic principles
which underlie the operation of the cyclic structure generator.
The generator's view of molecular structure differs in some
respects from the chemist's. A chemist, for example, may view
structures possessing the same functional group or ring as related.
The generator works at the more fundamental level of the underlying
graph (4,11), as described below.
.LMCOMMENT it might be noted early that dbl bonds make rings
∪Chemical ∪Graph. A molecular structure may be viewed as a graph,
termed the "chemical graph." A chemical graph consists of "nodes,"
with associated atom names, connected to one another by "edges."
Consider as an example the substituted piperazine, 1, whose chemical
graph is illustrated in Scheme 1 as structure 2. Note that hydrogen
atoms are ignored by convention, while the symbol "U" in used to
specify the unsaturation. The degree (primary, secondary, ...) of a
node in the chemical graph has its usual meaning, i.e., the number of
(non-hydrogen) edges connected to it. The valence of each atom
determines its maximum degree in the graph.
∪Superatom. A chemical graph is separable into cyclic and acyclic
parts, each of which may be termed a "superatom." A superatom is a
structural sub-unit possessing at least one "free valence" (12). The
.SEND FOOT ⊂
12) A free valence is a bond on an atom in the superatom to which
another atom (which is not part of the superatom) may be connected.
.⊃;
chemical graph 2 arises from a combination of three superatoms. Two
are carbon atoms with one free valence (methyl radicals). The third
is superatom 3, which possesses the indicated free valences to which
the remaining hydrogen and two methyl radicals will be attached
(Scheme 1). In the subsequent discussion, those superatoms which
consist of rings are termed "ring-superatoms."
∪Ciliated ∪Skeletons. A "ciliated skeleton" is a graph in which nodes
(but not atom names) and their interconnections are specified,
together with their free valences. Superatom 3 arises from the
ciliated skeleton 4 by associating the atom names of eight carbon
and two nitrogen atoms with the skeleton (Scheme 1).
∪Unciliated ∪Skeleton. Skeletons where nodes and edges are specified,
but free valences are not, are termed "unciliated skeletons." The
ciliated skeleton 4 arises from one way of associating sixteen free
valences with the nodes on the unciliated skeleton 5 (Scheme 1).
.BEGIN NOFILL
----------------------------------
Scheme 1
.GROUP SKIP 6
----------------------------------
.END
∪Underlying ∪Graph. Underlying graphs (11) possess only tertiary or
higher degree nodes. The underlying graph of the unciliated skeleton
5 is the regular trivalent graph (11) of two nodes, 6. Note that
the remaining nodes of the unciliated skeleton 5 are of degree two.
Removal of these secondary nodes from 5 while retaining the
interconnections of the two tertiary nodes yields 6.
As an illustration of the variety of structures which may be
constructed from a given underlying graph and empirical formula,
for example, C↓1↓0H↓2↓0N↓2, consider that graph 6 is the underlying graph
for all bicyclic ring systems (excluding spiro forms). Unciliated
skeletons 7 and 8 (Scheme 2), for example, may be constructed from
eight secondary nodes and 6. There are many ways of associating
sixteen free valences with each unciliated skeleton, resulting in
a larger number of ciliated skeletons. For example, 9 and 10 arise
from different allocations of sixteen free valences to 5 (Scheme 2).
There is only one way to associate eight carbon atoms and two nitrogen
atoms with each ciliated skeleton to yield superatoms (e.g. 11 and 12,
Scheme 2). However, several structures are obtained by associating
the remaining two carbon atoms (in this example) with each superatom,
as an ethyl or two methyl groups. Chemical graphs 13 and 14, for
example, arise from two alternative ways of associating two methyl
groups with superatom 3.
.BEGIN NOFILL
----------------------------------
Scheme 2
.GROUP SKIP 6
----------------------------------
.END
∪Multiple ∪Bonds. All multiple bonds (double, triple, ...) are
considered to be small rings by the program. Although chemists
normally maintain strong distinctions between multiple bonds and rings,
(for good reason) it was easier when implementing the algorithm in a
computer program to generate multiple bonds, whether isolated or
within ring systems, along with the rings. As will be mentioned
subsequently, from the program's standpoint acyclic portions of
structures cannot possess multiple bonds.
.<< ... it was found easier ... >>
∪Aims. The cyclic structure generator must produce a complete list of
structures without duplication. By duplicate structures we mean
structures which are equivalent in some well-defined sense. The
concept of equivalence is discussed in more detail in Appendix A and
in the accompanying paper (13).
.SEND FOOT ⊂
13) Accompanying description of the labelling algorithm.
.⊃;
The class of isomers generated by the program includes only
connectivity isomers (see Appendix B). Transformations (see Appendix A)
allowed under connectivity symmetry preserve the valence and bond
distribution of every atom. Connectivity symmetry does not consider
bond lengths or bond angles. This choice of symmetry results in
construction of a set of topologically unique isomers. A more
detailed discussion of isomerism and symmetry is presented in
Appendix B.
∪Strategy. The strategy behind the cyclic structure generator is
strongly tied to the framework described above. The strategy is
summarized in greatly simplified form in Figure 1. The underlying
graphs from which structures are constructed can be specified for a
given problem by a series of calculations. Thus Part A of the program
(Figure 1) assigns atoms in all possible ways to superatom partitions.
Each partition consists of atoms assigned to one or more "superatom-
parts" and "remaining atoms." Each superatompart is a collection of
atoms from which all possible, unique ring-superatoms (see above
definition) can be constructed based on the appropriate underlying
graphs (Part B, Fig. 1). Each ring-superatom will be a ring system in
completed structure. Remaining atoms will form acyclic parts of the final
structures when combined in all possible, unique ways with the ring-superatoms
from each superatompart in the initial partition (Part C, Fig. 1).
.HD DESCRIPTION
The subsequent detailed description parallels the operation of the
computer program (called the cyclic structure generator). Programming
details as well as complete descriptions of the underlying mathematics
are omitted for the sake of clarity and because they are available
from other sources (13-16).
.SEND FOOT ⊂
14a) H. Brown, L. Masinter, and L.Hjelmelend, Discrete Mathematics,
submitted;
b) H. Brown, L. Masinter, and L. Hjelmelend, Stanford Computer Science
Memo 318.
15) H. Brown and L. Masinter
16) Additional information will be available from the authors on
request.
.⊃;
The example chosen to illustrate each step of the
method is C↓6H↓8. This example, however, does not contain bivalent or
trivalent atoms (e.g., oxygen and nitrogen, respectively) or atoms of
valence greater than four, nor any univalent atoms other than hydrogen
(e.g., chlorine, fluorine). The description indicates how these
additional atoms are considered by the program.
↓_Partitioning and Labelling_↓. The mechanism for structure generation
involves a series of "partitioning" steps followed by a series of
"labelling" steps. Partitions are made of items which must be
assigned to objects (usually graph structures or parts thereof)
as the molecular structures are built up from the underlying graphs.
The process by which items are assigned to the graphs is termed
labelling (13,14). Examination of Scheme 1 reveals the different
types of items involved. For example, nodes are partitioned among and
labelled upon the edges of the underlying graphs to yield the
unciliated skeletons. Free valences are partitioned among and
labelled upon the nodes of unciliated skeletons to yield ciliated
skeletons, and so forth.
Partitioning steps in the subsequent discussion are carried out
assuming that objects among which items are partitioned are indist-
inguishable. Distinguishability of objects (edges, nodes, ...) is
specified during labelling and will be discussed in a subsequent
section. The partitioning steps performed by the program are
outlined in Table I. Each step is described in more detail below.
.BEGIN TABLE I,Partitioning Steps Performed by the Cyclic Structure Generator
Step # Partition Among
1 Atoms and Unsaturations Superatomparts and
in Empirical Formula Remaining Atoms
2 Free Valence Atoms in Superatompart
3 Secondary Nodes Loops
4 Non-looped Secondary Edges of Graph
Nodes
5 Looped Secondary Nodes Loops
6 Ring-superatoms and Efferent Links
Remaining Atoms (see Appendix D)
-------------------------------
.END
PART A. Superatom Partitions.
Ring-superatoms are "two-connected" structures, i.e., the
ring-superatom cannot be split into two parts by scission of a single
bond. The atoms in an empirical formula may be distributed among from
one to several such two-connected ring-superatoms. A distribution
which allots atoms to two or more superatomparts will yield
(respectively) structures containing two or more ring-superatoms
linked together by single bonds (or acyclic chains) (17).
.SEND FOOT ⊂
17) Chemists are more familiar with terms such as rings or ring
systems. The term two-connected is used here in conjunction
with ring-superatoms for a more precise description. For example,
biphenyl may be viewed as a single ring system or two rings depending
on the chemical context. In this work, however, biphenyl consists of
two ring-superatoms (two phenyl rings) linked by a single bond.
.⊃;
The initial formation of superatom partitions is linked
intimately with chemistry unlike several of the subsequent operations
within the cyclic structure generator. Thus, there are several well-known
concepts which are used during partitioning, dealing primarily with
valence and unsaturation.
.<<really doesn't have anything to do with chemistry- only valence>>
The first step is to replace the hydrogen count with the degree
of unsaturation. The number of unsaturations (rings plus double bonds)
is determined from the empirical formula in the normal way, as given
in equation 1.
.BEGIN NOFILL
U = 1/2 (2+↑n&↓[i=1]&∃(i-2)a↓i)→(1)
U = unsaturation
i = valence
n = maximum valence in composition
a↓i= number of atoms with valence i
.END
If the unsaturation count is zero, the formula is passed immediately
to the acyclic generator. Specifying the unsaturations as U's, the
example C↓6H↓8 becomes C↓6U↓3 (hydrogen atoms are omitted by convention).
There are several rules which are used during the partitioning
scheme, as follows:
.BEGIN INDENT 0,5
I. The resulting formula is stripped of other univalent atoms
(e.g., chlorine) as such atoms cannot be part of two-connected
ring-superatoms. These univalent atoms are assigned to the category of
remaining atoms.
II. The remaining atoms in a given partition (those not allocated
to superatomparts) can contain NO unsaturations. Thus ALL rings and/or
double bonds will be generated from the superatomparts.
III. It follows from I and II that every superatompart in the partition
must contain at least two atoms of valence two or higher plus at least one
unsaturation. If there are no unsaturations the atoms belong in the
remaining atoms category of another partition. In addition, an
unsaturation cannot be placed on a single atom. This rule defines
the minimum number of atoms and unsaturations in a superatompart.
IV. The maximum number of unsaturations in a superatompart is
given by Equation 2. Superatoms must possess at least one free
valence (12), so that superatomparts with no free valences,
e.g., O↓2U↓1 or C↓2U↓3, are not allowed.
.BEGIN NOFILL
U↓[max]= 1/2 (↑n&↓[i=3]&∃(i-2)a↓i)→(2)
U↓[max]= maximum unsaturation of a superatompart
i = valence
n = maximum valence in composition
a↓i = number of atoms with valence i
.END
V. The maximum number of superatomparts for a given formula
is defined by equation 3.
.BEGIN NOFILL
S↓[max]= 1/2↑n&↓[i=2]&∃a↓i→(3)
n = maximum valence in composition
S↓[max]= maximum number of superatomparts in a superatom partition
note: the summation is over all atoms of valence >2;
univalents are not considered.
.END
.BEGIN INDENT 0,0
Rules I-V define the allowed partitions of a group of atoms into
superatomparts. These rules do not, however, prevent generation of
equivalent partitions, which would eventually result in duplicate
structures. Defining a canonical ordering scheme to govern partitioning
prevents equivalent partitions. (There may be alternative schemes):
.<<is this necessary? the partitioning is nice, but this isn't complete>>
.<<anyway, and gets into too much detail>>
.END
VI. Canonical Ordering:
.BEGIN INDENT 6,6
a. Partition in order of increasing number of superatomparts.
b. For each entry in each part of (a), partition in order
of decreasing size of superatompart by allocation of atoms one at a
time to remaining atoms.
c. Each individual partition containing two or more
superatomparts must be in order of equal or decreasing size of the
superatompart. In other words, the number of atoms and unsaturations
in superatompart n+1 must be equal to or less than the number in
superatompart n. The program notes the equality of superatomparts in
a partition to avoid repetition.
.END
.END
The application of rules I-VI is best illustrated through
reference to the example of C↓6U↓3. The maximum number of superatomparts
for this example is three (Equation 3). There is one way to partition
*C6U3 into one superatompart with no remaining atoms, partition 1,
Table II. Subsequent assignment of carbon atoms to the category of
remaining atoms results in partitions 2-4, Table II. The next
partition following the sequence 1-4 would be C↓2U↓3 with C↓4 assigned
to remaining atoms. This partition is forbidden as C↓2U↓3 has no free
valences. The three ways to partition C↓6U↓3 into two superatomparts
are indicated along with the corresponding partitions following
assignment of atoms to remaining atoms, as partitions 5-10, Table II.
Note that the next logical steps are partitions C↓3U↓1/C↓3U↓2,
C↓2U↓2/C↓4U↓1, and so forth, which are not allowed by the canonical
ordering rule VIc, above. There is only one unique way of
partitioning C↓6U↓3 into three superatomparts, partition 11, Table II.
.BEGIN TABLE II,Allowed Partitions of C↓6U↓3 Into Superatomparts and Remaining Atoms.
Superatom
Partition Number of Superatompart Number Remaining
Number Superatomparts 1 2 3 Atoms
1 1 C↓6U↓3 - - -
2 1 C↓5U↓3 - - C↓1
3 1 C↓4U↓3 - - C↓2
4 1 C↓3U↓3 - - C↓3
5 2 C↓4U↓2 C↓2U↓1 - -
6 2 C↓3U↓2 C↓2U↓1 - C↓1
7 2 C↓2U↓2 C↓2U↓1 - C↓2
8 2 C↓4U↓1 C↓2U↓2 - -
9 2 C↓3U↓1 C↓2U↓2 - C↓1
10 2 C↓3U↓2 C↓3U↓1 - -
11 3 C↓2U↓1 C↓2U↓1 C↓2U↓1 -
.END
PART B. Ring-superatom Construction.
Each partition (Table II) must now be treated in turn. The
complete set of ring-superatoms for each superatompart in a given
partition must be constructed. The major steps in the procedure are
outlined in Figure 2.
∪Valence ∪List. The first step in part B is to strip the
superatompart of atom names, while retaining the valence of each atom.
The numbers of each type of atom are saved for later labelling of the
skeleton. A valence list may then be specified, giving in order the
number of bi-, tri- , tetra- and n-valent nodes which will be
incorporated in the superatom. Thus the superatompart C↓6U↓3 is
transformed into the valence list 0 bivalents, 0 trivalents, 6
tetravalents (0, 0, 6), and C↓4U↓2 becomes (0, 0, 4) (Figure 2).
↓_Calculation of Free Valence_↓. From the valence list and the associated
unsaturation count the number of free valences of each superatompart
is determined using equation 4.
.BEGIN NOFILL
FV = (2 +↑n&↓[i=3]&∃(i-2)a↓i)-2U→(4)
U = unsaturation of superatompart
i = valence
n = maximum valence in composition
a↓i= number of atoms with valence i
FV = free valence
.END
↓_Partitioning of Free Valence_↓. The free valences are then partitioned
among the nodes in the valence list in all possible, unique ways. As
in the earlier partitioning problem, there are some simple rules which
must be followed. Because ring-superatoms are two-connected
structures two valences of each atom of a superatompart must be used
to connect the atom to the ring-superatom. Thus no free valences can
be assigned to bivalent nodes in the valence list, a maximum of one to
each trivalent, a maximum of two to each tetravalent, and so forth.
The example (Fig. 2) is further simplified in that there are only
tetravalent nodes in the valence list. Inclusion of trivalent nodes
(e.g., nitrogen atoms) merely extends the number of possible
partitions. The free valences are partitioned among the tetravalent
nodes in all ways, as illustrated in Figure 2. It is important to
note that removal of atom names makes all n-valent (n=2 or 3 or ...)
nodes in the valence list equivalent at this stage. Thus the
partitions (of eight free valences among six tetravalent nodes)
222200, 222020, 222002, ......, 002222 are all equivalent. Only one
of these partitions is considered to avoid eventual duplication of
structures.
.BREAK
↓_Degree List_↓. Each partition of free valences alters the effective
valence of the nodes in the original valence list with respect to the
ring-superatom. In the example, assignment of one or two free valences to
a tetravalent node transforms this node into a tri- or bivalent
node respectively. As the ring-superatom is constructed, those
tetravalent nodes which have been assigned, say, two free valences,
have then only two valences remaining for attachment to the ring-superatom.
These nodes are then of degree (18) two and may be termed secondary
nodes (18). Thus the partition of free valences 2,2,2,2,0,0
.SEND FOOT ⊂
18) The term "valence with respect to the ring-superatom" is confusing
as valence has a clear chemical meaning. Rather, we speak of the
degree of a node in the chemical graph (see Overview). Standard use of
the terms primary, secondary, ... when speaking of degree refers to
bonding to atoms which are not hydrogen atoms. Use of degree in this
report refers to the number of bonds other than free valences. A
free valence may or may not eventually be attached to a hydrogen atom
in the final structure.
.⊃;
on six tetravalent nodes yields the degree list (4,0,2) (Fig.
2) as four of the tetravalent nodes receive two free valences each
yielding four nodes of degree two (secondary) and leaving two
nodes of degree four (quaternary). The program keeps track of the
number of free valences assigned to all nodes for use in a subsequent
step.
∪Loops. As will be clarified in the subsequent discussion, there are
several general types of ring-superatoms which cannot be constructed
from the underlying graphs available in the CATALOG (described below).
These are all cases of multiple extended unsaturations either in the
form of double bonds or rings. Examples are the following:
.BEGIN NOFILL
1) bi-, tri-, ... n-cyclics with exocyclic double bonds;
2) some types of SPIRO ring systems;
3) allenes extended by additional double bonds, e.g.,
c=c=c=c .
.END
The concept of a loop, consisting of a single unsaturation and at
least one bivalent node must be specified for these cases. Examples
of loops containing one, two and three bivalent nodes are shown in
Scheme 3. Note that the two remaining "ends" of the unsaturation
will yield a "looped structure" when attached to a node in a graph
(shown as X, Scheme 1).
.BEGIN NOFILL
---------------------
Scheme 3
bivalents = 1 2 3
.GROUP SKIP 4
---------------------
.END
The method for specification of loops is discussed in Appendix C.
This procedure yields the reduced degree list which contains none of
the secondary nodes originally present in the degree list (see
Appendix C).
↓_Underlying Graphs_↓. The reduced degree lists are used to specify a
set of underlying graphs for the eventual ring-superatoms. All two-
connected structures can be described by their underlying graphs, which
are, for most structures, regular trivalent graphs. This concept has
been described in detail by Lederberg (11), who has also presented a
generation and classification scheme for such graphs. Given a set of
all underlying graphs, the set of all ring-superatoms may be specified.
The underlying graphs are maintained by the program in the CATALOG.
Catalog entries for regular trivalent graphs possessing two, four and
six nodes are presented in Table III. This list must be supplemented
by additional underlying graphs to cover several special cases.
Several of these additional graphs, which are sufficient for
generation of all structures in the example, are also presented in
Table III.
.BEGIN TABLE III,Partial Listing of Underlying Graphs in the CATALOG
.APART
Reduced
Structure Name* Degree List Remarks
2A (k,2,0,0) Regular trivalent graph of 2 nodes
4AA (k,4,0,0...) Regular trivalent graph of 4 nodes
4BB (k,4,0,0...) Regular trivalent graph of 4 nodes
6AAA (k,6,0,0...) Regular trivalent graph of 6 nodes
6ABB (k,6,0,0...) Regular trivalent graph of 6 nodes
6ACA (k,6,0,0...) Regular trivalent graph of 6 nodes
6BCB (k,6,0,0...) Regular trivalent graph of 6 nodes
6CCC (k,6,0,0...) Regular trivalent graph of 6 nodes
"Singlering e.g.
M" (4,0,0...) A single ring composed of M
secondary nodes.
"Daisy" (k,0,1,0...) A single 2n-degree node (n=2,3...)
e.g. $1AA
"T03" (k,0,3,0...) Graph composed of three quaternary
nodes
$3BCB (k,2,1,0...) Graph composed of two tertiary and
one quaternary node
"T22" (k,2,2,0...) Graph composed of two tertiary and
two quaternary nodes
"T22" (k,2,2,0...) Graph composed of two tertiary and
two quaternary nodes.
"T22" (k,2,2,0...) Graph composed of two tertiary and
two quaternary nodes
"T22" (k,2,2,0...) Graph composed of two tertiary and
two quaternary nodes
$5CECC (k,4,1,0...) Graph composed of four tertiary
and one quaternary nodes
$BCCB (k,4,1,0...) Graph composed of four tertiary
and one quaternary nodes
$B:5AECA (k,4,1,0...) Graph composed of four tertiary
and one quaternary nodes
$5ABCB (k,4,1,0...) Graph composed of four tertiary
and one quaternary nodes
$5ACDB (k,4,1,0...) Graph composed of four tertiary
and one quaternary nodes
e.g.(k,0,2,0,...) Two nodes of equal valence (>3)
* The graphs are named according to Lederberg (11).
------------------------------------------------------------------
.END
With the reduced degree list of a superatompart, the program
requests the appropriate CATALOG entries. In the example (Fig. 2),
the degree list (4,0,2) specifies underlying graphs containing two
quaternary nodes. The degree list (2,4,0) specifies regular
trivalent graphs of four nodes, of which there are two, 4AA and 4BB
(Table T2). Note that secondary nodes in the degree list are
ignored since underlying graphs have only trivalent or higher nodes.
The exception is when only secondary nodes are present in the
degree list, when the underlying graph "Singlering" (Table III) is utilized.
∪Interlude. Up to this point in the method, the program has
effectively decomposed the problem into a series of subproblems,
working down from the empirical formula through a set of partitions
and subpartitions to the set of possible underlying graphs.
Subsequent steps now work back up, in a sense, as the underlying
graphs are expanded in a series of steps to the final structures
employing the results of the previous partitions.
↓_Labelling Edges of Underlying Graphs with Special Secondary Nodes_↓.
The first step in building up structures is to attach the secondary
nodes in the reduced degree list to the underlying graphs. It
will be recalled that these secondary nodes are those that will
.<<ONLY POSSIBLY, WHEN THERE WERE LOOP PARTITIONS>>
have loops attached, thus the term "special secondary nodes" is
used. Because a graph structure is involved, the procedure by which
the possible attachments of the nodes to the graph are specified
may be viewed as a "labelling" of the graph. This is the first of
six such graph labelling steps required by the program. The six
steps are listed in Table IV. All steps involve
the same labelling problem, that of associating a set of n not
necessarily distinct labels with a set of n not necessarily distinct
objects. This problem is solved by a single algorithm, the "labelling
algorithm", for each of the six labelling steps (Table III). This
algorithm is described in some detail in the subsequent publication
(11). A description of the underlying mathematics and proof of
thoroughness and irredundancy will appear separately (12).
.BEGIN TABLE IV,The Six Graph Labelling Steps Performed by the Labelling Algorithm
Labelling Step Function
1 Label Edges of Underlying Graphs with
Special Secondary Nodes.
2 Label Edges of Resulting Graphs with
Non-Looped Secondary Nodes.
3 Label Loops of Resulting Graphs with
Looped Secondary Nodes.
4 Label Nodes of Skeletons with Free
Valences.
5 Label Nodes of Skeletons with Atom Names.
6 Label Free Valences of Superatoms with
Radicals (see Appendix D).
.END
It is important in the context of this discussion, however, to
discuss some aspects of the first labelling step to indicate how
equivalent labellings (which would eventually yield duplicate
structures) may be avoided prospectively. This discussion
is valid for each of the subsequent labelling steps.
Equivalent labellings are prospectively avoided by recognition of the
symmetry properties of, in the first labelling, the underlying graph.
These symmetry properties are expressed in terms of the permutation
group (see Appendix A and refs. 13 and 14), in the first labelling, on
the edges of the underlying graph. This permutation group, which
defines the equivalence of the edges, may be specified in the CATALOG
or, alternatively, calculated as needed by a separate part of the
cyclic structure generator. As subsequent labelling steps are
executed, a new permutation group (e.g., on the nodes for labelling
step four) is calculated, again as necessary. Thus, only labellings
.<<I want it to be clear that the group is not recalculated from scratch>>
which result in unique expansions of the structure are permitted.
In the simple example, (Fig. 2), the symmetries of the
underlying graphs and subsequent skeletons can be discerned easily by
eye. In the general case this is not true, however. For
example, all edges of the underlying graph containing two tetravalent
nodes are equivalent, as are the edges of the regular trivalent graphs
2A and also 4BB. The $3BCB graph (Table II, Fig. 2) has four
equivalent edges and one other edge, and so forth. In the general
case, however, the symmetries of the underlying graphs and subsequent
expansions thereof are not always obvious.
.<< there is something essentially different from the program (but not >>
.<< really wrong) here - that is, the loop-secondary nodes are partitioned>>
.<< among the loops before hand; and it is the loops WITH NODES which are>>
.<< labelled on at first >>
With distinguishability specified by the group on the edges,
labelling of the underlying graphs with special secondary nodes
is carried out. The results of this procedure for partitions
containing loops are indeicated in Figure 2.
↓_Partition of and Labelling With Non-Loop Secondary Nodes_↓.
The secondary nodes which were not assigned to loops
("non-looped secondary nodes") are partitioned among the edges
of the graphs which arise from the previous labelling step.
As in previous partitionings, the edges are initially
assumed to be indistinguishable. In addition, loops are not
.<<what does "... indistinguishable" mean???>>
counted as edges. There are, for example, five ways to
partition four non-loop secondary nodes among the edges
of the underlying graph possessing two quaternary nodes (Fig. 2).
Labelling of the graphs with non-loop secondary nodes
is then carried out. Each of the five partitions mentioned
immediately above results in a single labelling, as all edges
of the graph are equivalent. When edges are distinguishable
there may be several ways to label a graph with a single partition.
There are, for example, for the $3BCB graph, two ways to label
with the partition 3,0,0,0,0, four ways with the partition
2,1,0,0,0 and three ways with the partition 1,1,1,0,0 (Fig. 2).
↓_Partitioning of and Labelling with Loop Seconday Nodes_↓.
There remain unassigned to the graphs at this point only
secondary nodes which were assigned to loops. These nodes
are first partitioned among the loops assuming indistinguish-
ability and remembering that each loop must receive at least one
secondary node. For example, following the path from the degree
list (4,0,2) through the specification of two loops (Fig. 2),
there are two ways of labelling the two equivalent loops with
four secondary nodes. There is one way to label the two loops
of the adjacent graph with three secondary nodes and one way of
labelling the two loops of each of the two remaining graphs in
this section of Figure 2 with two secondary nodes. In this
example (*C6U3) the loops in every case are equivalent or there
is only one loop to be labelled. In the general case loops may
not be equivalent, resulting in a greater number of ways to label
loops with a given partition of secondary nodes.
↓_Unciliated Skeletons_↓. The previous labelling steps specified the
number of secondary nodes on each edge of and loop attached to the
underlying graphs. All atoms in the original superatompart are thus
accounted for. A representation of the result is the unciliated
skeleton, where nodes and their connections to one another are
specified. These skeletons begin to resemble the final structures.
↓_Free Valence Labelling_↓. Each of the nodes in an unciliated
skeleton is then labelled with free valences. This labelling is
trivial in the example, as all atoms are of the same valence (four).
This labelling is illustrated in some cases in Figure 2, yielding
the indicated ciliated skeletons. This labelling is performed
independent of the IDENTITIES of the atoms, but with knowledge of how
many atoms of each valence type were present in the original
superatompart.
A simple example illustrates the potential complexity of this labelling
problem when atoms of more than one valence type are present. If
there were, say, n nitrogen atoms in the superatompart, in addition to
m carbon atoms, then n nodes in the ciliated skeleton must be
trivalent, and m must be tetravalent, in all unique ways. The free
valence labelling would be done accordingly, yielding bivalent nodes
with one or two free valences and trivalent nodes with zero or one
free valences. In the general case there may be several ways to
perform this labelling on a single unciliated skeleton, whereas in the
C↓6U↓3 example there is only one way.
↓_Atom Name Labelling_↓. The nodes of a ciliated skeleton are then
labelled with atom names to yield the ring-superatom(s). Again this
labelling is trivial in the example, as only one type of atom is
present (carbon), yielding in each case only a single superatom (Fig.
2). If there is more than one type of atom with the same valence
(e.g., silicon and carbon), the labelling problem is more complex.
Each node of appropriate valence may be labelled with either type of
atom. Duplicate structures are avoided by calculating the group on
the set of nodes of equal valence and labelling accordingly.
.HD PART C. ACYCLIC GENERATOR.
The superatom partition expanded in the example had no atoms
assigned to acyclic chains (remaining atoms). The set of superatoms
on completion of Part B, above, thus yields the set of 36 structures
on placement of a hydrogen atom on each free valence (Fig. 2). If the
superatom partition (partitions 2-11, Table II) contained more than
one superatompart or remaining atoms, the acyclic generator must be
used to connect the segments of the structure in all ways. This
procedure is described in detail in Appendix D.
.TITL DISCUSSION
↓_Completion of C↓6U↓3_↓. The example (Fig. 2) has considered only
expansion of a single superatom partition. The total number of
isomers of C↓6U↓3 and their structures considering all partitions has been
determined by the program to be 159. It may be instructive for the
reader to attempt to generate, by hand, structures from another
partition, following the method outlined above and in Figure 2. The
initial superatom partitions yield some indication of the types of
structures which will result from each partition. For example,
partition 4 (Table II), C↓3U↓3 in a single superatompart, plus three
carbons in the remaining atoms, should yield all structures
containing a three-membered ring possessing two double bonds or a
triple bond. As there are only two free valences, the remaining atoms
can be in a single chain (as a propyl or iso-propyl radical) or as
a methyl and an ethyl group, but not as three methyl groups.
↓_Completeness and Irredundancy_↓. Although a mathematical proof of the
completeness and irredundancy of the method exists (14 15), there is
no guarantee that the implementation of the algorithm in a computer
program maintains these desired characteristics. Confidence in the
thoroughness and irredundancy of the algorithm can be engendered in
the following ways:
.BEGIN INDENT 6
1) Duplication of the program's performance by another,
completely independent approach. If such an approach exists (to our
knowledge it does not), it must be proven mathematically. An
independent algorithm has been developed which is capable of
enumeration, but not construction, of all isomers of a composition
containing C,H,N, and O. Implementation of that algorithm is limited
at the present time, unfortunately, to compositions containing only 5
or fewer carbon atoms and various numbers of hydrogen atoms. The
limitation is due to computer memory constraints. For these cases,
however, the numbers of isomers obtained by both programs agree.
2) Testing by manual generation of structures. Several
chemists, all without knowledge of the algorithm described above, have
been given several test cases, including C↓6U↓3, from which structures
were generated by hand. Familiarity with chemistry is no guarantee of
success, as evidenced by the performance of three chemists for the
superficially simple case of C↓6U↓3 (C↓6H↓8, Table V).
This example indicates that for more than very trivial cases, it is
extremely difficult to avoid duplicates (tricyclics, for example, are
difficult to visualize when testing for duplicates) and omissions.
Omissions appear to result from both being careless and forgetting
ring systems that are implausible or unfamiliar. The program seems
better at testing the chemist than vice versa.
In every instance of manual structure generation, no one has
been able to construct a legal structure that the program failed to
construct. No one has been able to detect an instance of duplication
by the program. This performance builds some confidence, but manual
verification of more complicated cases is extremely tedious and
difficult. Isomers for many empirical formulae have been generated,
and some results are tabulated in Table VI.
The choice of examples has been motivated by a desire to test
all parts of the program where errors may exist while keeping the
number of isomers small enough to allow verification. In
this manner all obvious sources of error have been checked, for
example, construction of loops on loops, multiple types of atoms of
the same valence (e.g. Cl, Br, I) and partitions containing atoms of
several different valences including penta- and hexavalent atoms.
3) Varying the order of generation. The structure of the
program permits additional tests by doing some operations in a
different order. For example, one variation allowed is to leave
hydrogens associated with the atoms in each partition rather than
to strip them away initially and place them on the remaining free
valences in the last step. Each such test has resulted in the same
set of isomers.
.END
In summary, the verification procedures utilized have all
indicated absence of errors in the computer implementation of the
algorithm. Also, there is no clear reason why generation of larger
sets of isomers should not also proceed correctly. The final verdict
however, must await development of new mathematical tools for
verification by enumeration (see above) or an alternative algorithm.
.BEGIN TABLE V,|Performance of Three↑* Chemists in Manual Generation
of Isomers of C↓6H↓8 (C↓6U↓3). There are 159 Isomers|
Number Generated Type of Error
Chemist 1 161 4 duplicates; 4 omissions;
2 with 7 carbon atoms.
Chemist 2 168 16 duplicates; 7 omissions
Chemist 3 160 2 duplicates; 1 omission
* One PhD and two graduate students.
-----------------
.END
.BEGIN TABLE VI,The Number of Isomers for Several Empirical Formulae
Empirical Example Number of Isomers Manually Verified?
Formula Compound
C↓6H↓6 benzene 217 yes
C↓6H↓8 1,3-cyclohexadiene 159 yes
C↓6H↓1↓0 cyclohexene 77 yes
C↓6H↓1↓2 cyclohexane 25 yes
C↓6H↓1↓4 hexane 5 yes
C↓6H↓6O phenol 2237
C↓6H↓1↓0O cyclohexanone 747 no
C↓6H↓1↓2O 2-hexanone 211 yes
C↓3H↓4N↓2 pyrazole 155 no
C↓3H↓6N↓2 2-pyrazoline 136 yes
C↓3H↓8N↓2 tetrahydropyrazole 62 no
C↓3H↓1↓0N↓2 propylenediamine 14 yes
C↓4H↓9P↓1 (pentavalent P) 110 no
-----------------
.END
∪Constraints. The structure generator is designed to produce a list of
all possible connectivity isomers (Appendix B). This list contains
many structures whose existence seems unlikely based on present
chemical knowledge. In addition, the program may be called on to
generate possible structures for an unknown in the presence of a body
of data on the unknown which specify various features, (e.g.,
functional groups) of the molecule. In such instances mechanisms are
required for constraining the generator to produce only structures
conforming to specified rules. The implementation of the acyclic
generator possessed such a mechanism in the form of GOODLIST (desired
features) and BADLIST (unwanted features) (3) which could be utilized
during the course of structure generation.
The cyclic generator is less tractable. As in prospective
avoidance of duplicate structures, it is important that unwanted
structures, or portions thereof, be filtered out as early in the
generation process as possible. It is relatively easy to specify
certain general types of constraints in chemical terms, for example, the
number of each of various types of rings or ring systems in the final
structure, ring fusions, functional groups, sub-structures and so
forth. It is not always so easy to devise an efficient scheme for
utilizing a constraint in the algorithm, however. As seen in the
above example (Fig. 2) the expanded superatom partition results in
what would be viewed by the chemist as several very different ring
systems.
The design of the program facilitates some types of constraints.
For example, the program may be entered at the level of combining
superatoms to generate structures from a set of known sub-structures.
If additional atoms are present in an unknown configuration, they can
be treated as a separate generation problem, the results of which are
finally combined in all ways with the known superatoms. This
approach will not form additional two-connected structures, however.
Constraints which disallow an entire partition may be easily included.
For example, it is possible to generate only open chain isomers by
"turning off" the appropriate initial superatom partitions.
Much additional work remains, however, before a reasonably
complete set of constraints can be included. The implementation of
each type of constraint must be examined and tested in detail to
ensure that the generator remains thorough and irredundant.
.HD CONCLUSIONS
The boundaries, scope and limitations of chemical structure can
now be specified.
.HD ACKNOWLEDGEMENTS
.NEXT PAGE
Appendix A. Equivalence Classes and Finite Permutation Groups.
Two members of a set of possible isomers may be defined to be
equivalent if a specified transformation of one member causes it to
be superimposable upon another member of the set. For example, there
are fifteen possible ways of attaching two chlorine and four hydrogen
atoms to a benzene ring (Scheme 4).
.BEGIN NOFILL
-----------------------------------------------------------------
Scheme 4
.GROUP SKIP 8
-----------------------------------------------------------------
.END
If rotations by multiples of 60 degrees are specified as allowed
transformations, the fifteen structures fall logically into three
classes, termed "equivalence classes" (Scheme Sa). Within each
equivalence class structures may be made superimposable by the
rotational transformation. If one element (in this case a molecular
structure) is chosen from each equivalence class, the complete set of
possible structures is determined, without duplication. It is the
task of the labelling algorithm to produce one and only one graph
labelling corresponding to one member of each equivalence class.
The set of transformations which define an equivalence class is
termed a "finite permutation group." This permutation group may be
calculated based on the symmetry properties of a graph (or chemical
structure in the example of Scheme Sa). This calculation provides
the mechanism for prospective avoidance of duplication. These
procedures are described more fully in the accompanying paper (13).
.NEXT PAGE
Appendix B. Isomerism and Symmetry.
Appendix A introduced the concept of equivalence classes and
finite permutation groups. The selection of transformation (Appendix A)
directs the calculation of the permutation group and thus defines the
equivalence classes. Different types of transformation may be
allowed depending on the symmetry properties of the class of isomers
considered. This Appendix discusses several of the possible types of
isomerism, most of which are familiar to chemists. The reader seeking
a more thorough discussion of some types of isomerism discussed below is
referred to an exposition of molecular symmetry in the context of
chemistry and mathematics (19).
.SEND FOOT ⊂
19) I. Ugi, D. Marquarding, H. Klusacek, G. Gokel, and P. Gillespie,
Angew. Chem. internat. Edit., 9, 703 (1970).
.⊃;
Isomers are most often defined as chemical structures possessing
the same empirical formula. Different concepts of symmetry give rise
to different classes of isomers, some of which are described below.
∪Permutational ∪Isomers. Permutational isomers are isomers which have
in common the same skeleton and set of ligands. They differ in the
distribution of ligands about the skeleton. Gillespie et al. (20)
.SEND FOOT ⊂
20) P. Gillespie, P. Hoffman, H. Klusacek, D. Marquarding, S.
Pfohl, F. Ramirez, E. A. Tsolis, and I. Ugi, Angew. Chem.
internat. Edit., 10, 687 (1971).
.⊃;
and Klemperer (21)
.SEND FOOT ⊂
.BEGIN NOFILL
21) a) W. G. Klemperer, J. Amer. Chem. Soc., 94, 6940 (1972);
b) W. G. Klemperer, ibid, p. 8360.
.END
.⊃;
have used the concept of permutational isomers to
probe into unimolecular rearrangement or isomerization reactions.
∪Stereoisomers. Ugi et al. (19) have defined the "chemical constitution" of
an atom to be its bonds and bonded neighbors. Those permutational
isomers which differ only by permutations of ligands at constitutionally
equivalent positions form the class of stereoisomers.
↓_Isomers Under Rigid Molecular Symmetry_↓. If one conceives of molecular
structures as having rigid skeletons, the physical rotational (three dimensional)
symmetries and transformations may be readily defined. Each transformation
causes each atom (and bond) to occupy the position of another or same
atom (and bond) so that the rotated structure can physically occupy its
former position and at the same time be indistinguishable from it in any
way. This is the most familiar form of symmetry. Under this type of symmetry
conformers are distinguishable and belong in distinct equivalence
classes. Every transformation is orthogonal and preserves bond angles
and bond lengths as well as maintaining true chirality.
If one allows other orthogonal transformations that alter chiral
properties of structures, equivalence classes result that treat both
the left-handed and right-handed forms of chiral molecules to be the
"same". Thus a "mirror image" transformation when suitably defined permits
the left-handed form to exactly superimpose the right-handed form and
vice-versa.
↓_Isomers Under Total Molecular Symmetry_↓. If in addition to the above
mentioned rigid molecular transformations one recognizes the flexional
movements of a nonrigid skeleton, a dynamic symmetry group may be
defined. Under this definition, different conformers now are grouped
together. Thus the "chair" and "boat" conformations of cyclohexane
belong to the same equivalence class under dynamic symmetry. The
permutation group of skeletal flexibility is computable separately and
independently of rigid molecular symmetry. One can then view total
molecular symmetry as the product of the two finite permutation
groups.
↓_Isomers Under Connectivity Symmetry_↓. The concept of connectivity
symmetry was introduced previously (METHOD section). Every
permutation of atoms and bonds onto themselves is a symmetry
transformation for connectivity symmetry if,
.BEGIN INDENT 6,6
a) each atom is mapped into another of like species, e.g., N to N,
C to C, O to O, and
b) for every pair of atoms, the connectivity (none, single,
double , triple, ...) is preserved in the mapping, i.e. the
the connectivity of the two atoms is identical to the
connectivity of the atoms they are mapped into.
.END
One can readily recognize that transformations as defined
automatically preserve the valence and bond distribution of every
atom. It is very probable that readers accustomed to three
dimensional rotational and reflectional symmetries will tend to equate
them with the symmetries of connectivity. It is emphasized again that
connectivity symmetry does not consider bond lengths or bond angles,
and it includes certain transformations that are conceivable but have
no physical interpretation save that of permuting the atoms and bonds.
.NEXT PAGE
Appendix C. Calculation of Loops.
There are several rules which must be followed in consideration
of loop assignment to ring-superatoms. The minimum (MINLOOPS) and
maximum (MAXLOOPS) numbers of loops for a given valence list are
designated by equations 5 and 6.
.BEGIN NOFILL
MINLOOPS = max{0,a↓2+1/2(2j↓[max]-↑n&↓[i=2]&∃ja↓j)}→(5)
MAXLOOPS = min{a↓2,↑n&↓[j=4]&∃((i-2)/2)a↓j}→(6)
MINLOOPS = minimum number of loops
MAXLOOPS = maximum number of loops
a↓2 = number of secondary nodes in degree list
j↓[max] = degree of highest degree item in degree list
j = degree
n = highest degree in list
a↓j = number of nodes with degree j.
.END
The form of the equations results from the following considerations:
.BEGIN INDENT 6,6
1) Only secondary nodes may be assigned to loops. Nodes of
higher degree will always be in the non-loop portion
of the ring-superatom.
2) A loop, by definition, must be attached by two bonds to a
single node in
the resulting ring-superatom. The loop cannot be attached through the free
valences. Thus the degree list must possess a
sufficient number of quaternary or higher degree nodes to support the
loops(s).
3) Each loop must have at least one secondary node, which is the
reason MAXLOOPS is restricted to at most the maximum number of secondary
nodes in the degree list (Equation 6).
4) There must be available one unsaturation for each loop (this
is implicit in the calculation of MINLOOPS and MAXLOOPS) as each
loop effectively forms a new ring.
.END
↓_Partitioning of Secondary Nodes_↓. For each of the possible numbers of
loops (0,1, ...) the secondary nodes are removed from the degree list
and partitioned among the loops, remembering that the loops are at
present indistinguishable and each loop must receive at least one
secondary node. In the example (Fig. 2), starting with the degree
list (4,0,2), there are three ways of partitioning the four secondary
nodes among two loops and the remaining non-looped portion. Removal
of the four secondary nodes from the degree list and assignment of
two, three or four of them to two loops results in the list specified
in Figure 2 as the "reduced degree list". Specification of two loops
transforms the two quaternary nodes in the degree list into two
secondary nodes. This results from the fact that two valences of a
quaternary or higher degree node must be used to support each loop.
These are "special" secondary nodes, however, as these particular
nodes are the ones which will be connected to loops as the structure
is built up. Thus, in the example, any secondary nodes which are
found in the reduced degree list will have a loop attached in a
subsequent step. The degree list (4,0,2) thus becomes the reduced
degree list (2,0,0) in the partition specifying two loops (Fig. 2).
Similarly, the partition of one loop for the degree list (3,2,1)
results in a reduced degree list of (1,2,0) with the three original
secondary nodes partitioned among loop and non-loop portions (Figure 1).
If, after the first, second, ... nth loop partition, there
remain one or more quaternary or higher degree nodes in the reduced
degree list, the list must be tested again for the possibility of
additional loops. Each loop partition will generate a new set of
.<<the word is level>>
structures. The second pass will yield those structures possessing
loops on loops, and so forth. One such superatom which would be
generated in this manner from a composition of (at least) C↓6U↓5 is 15.
.BEGIN NOFILL
c=c=c=c=c=c
15
.END
The partition of (4,0,2) including one loop results in each case in
a reduced degree list (1,0,1). This list is disallowed in the
.<<that it is generated is more of a bug in the program (harmless bug) >>
.<<than a reflection on the algorithm - since the formal def of the algo->>
.<<rithm specifies to check as part of the loop partitioner for invalid >>
.<<things >>
subsequent step, as the underlying graph (vide infra) for one
quaternary node is a daisy (Table II), which requires a minimum of
two secondary nodes with which to label the daisy loops (a minimum of
one secondary node in the reduced degree list for each loop of the daisy).
Appendix D
A method of construction of structures similar to the generation
of acyclic molecules is utilized to join multiple ring-superatoms and
remaining atoms. The DENDRAL algorithm for construction
of acyclic molecules (3,24)
.SEND FOOT ⊂
24) A more complete description of the algorithm is available; see
B. G. Buchanan, A. M. Duffield, and A. V. Robertson, in
"Mass spectrometry, Techniques and Applications," G. W. A.
Milne, ed., John Wiley and Sons, Inc., 1971, p. 121.
.⊃;
relied on the existence of a unique central atom
(or bond) to every molecule. The present acyclic generator uses the
same idea. The present algorithm, though simpler in not having to
to treat interconnection of atoms or ring-superatoms through multiple bonds,
is more complex because of the necessity to deal with the symmetries
of the ring-superatoms.
C1. METHOD FOR THE CASE WITH EVEN NUMBER OF TOTAL ATOMS.
The superatom partition C↓2U↓2/C↓2U↓1/-/C↓2 (partition 7, Table II
and Figure 2) will be used here to illustrate this procedure. The
superatomparts C↓2U↓2 and C↓2U↓1 have exactly one possible ring-superatom for
each (see Table VII).
.BEGIN TABLE VII
Superatompart Superatom
C↓2U↓2 -C≡≡C-
C↓2U↓1 >C==C<
----------------------------------------
.END
Thus acyclic structures are to be built with -C≡≡C- , >C==C< and
two C's.
There are an even number atoms and ring-superatoms. The structures
to be generated fall into two categories:(a) those with a central bond;
(b) those with a central atom.
Category A. CENTRAL BOND (see Fig. 3).
Step 1. Partition into Two Parts.
The atoms and ring-superatoms are partitioned into two parts, with
each part having exactly half the total number of items. Each atom or
ring-superatom is a single item. Each part has to satisfy equation 7,
called the Restriction on Univalents.
.BEGIN NOFILL
Restriction on Univalents:
1+↑n&↓[i=1]&∃(i-2)a↓i≥ 0→(7)
i = valence
a↓i = number of atoms or superatoms of valence i
n = maximum valence in composition
.END
There are two ways of partitioning the four items into two
parts (Fig. 3). The restriction on univalents is satisfied in
each case. The restriction will
disallow certain partitions that have "too many" univalents other
than hydrogens and therefore is essential only in partitioning
compositions that contain any number of non-hydrogen univalents.
Step 2. Generate Radicals from Each Part.
Using a procedure described in Section C3, radicals are
constructed from each part in each partition. Table VIII shows
the result of applying this procedure to the example.
.BEGIN TABLE VIII
Part Radicals
---------------------+------------------------------
-C≡≡C- , >C==C< -> -C≡≡C-CH=CH2
-CH=CH-C≡≡CH
-C-C≡CH
"
CH2
C2 -> -CH2-CH3
-C≡≡C- , C -> -C≡≡C-CH3
-CH2-C≡≡CH
>C==C< , C -> -CH==CH-CH3
-C-CH3
"
CH2
-CH2-CH==CH2
====================================================
.END
Step 3. Form Molecules From Radicals.
The radicals are combined in unique pairs, within each initial
partition. Each pair gives rise to a unique molecule, for each of
which is the center is a bond. There are nine such molecules that have
a central bond, for the example chosen (Fig. 3).
Category B. CENTRAL ATOM (see Fig. 4).
Step 1. Selection of Central Atom.
One must consider every unique atom or ring-superatom that has
a free valence of three or higher as a central atom. In the
example, of three candidates available: -C≡≡C- , >C==C< and C,
the first is not chosen for it has a free valence of only two.
Step 2. Partition the Rest of the Atoms.
The atom or ring-superatom chosen for the center is removed from the set
and the rest are partitioned into a number of parts less than or
equal to the valence of the central atom. Each part must have less than
half the total number of items being partitioned (again a
ring-superatom is a single item). Each part must satisfy the
restriction on univalents (equation 7).
Thus, for the case where a carbon is the center, from one to four
partitions are generated with the condition that each part has at most
(4-1)/2 or 1 atom. No valid partitions are possible for
one, two or four parts. There is exactly one partition for three
parts, i.e., one atom in each. The partitions are shown in Figure 4.
Step 3. Generate Radicals.
Once again, using the procedure described in Section C3,
radicals are generated for each part in each partition. For example,
the partition -C≡≡C- gives rise to exactly one possible radical -C≡≡CH (Fig. 4).
Step 4. Combine Radicals.
Although in the example shown every part generates only one
radical, in the general case there will be many radicals for each
part. If so, the radicals must be combined to give all unique
combinations of radicals within each partition.
Step 5. Form Molecules from Central Atom and Radicals.
If the central atom is not a ring-superatom but is a simple atom, then
each combination of radicals derived in Step 4 defines a single
molecule that is unique. Thus for example when C is chosen as the
center, step 4 gives one combination of radicals which determines
a single molecule when connected to the central C (see Figure 4).
If the central atom is a ring-superatom and the valences of the
ring-superatom
are not identical then different ways of distributing the radicals
around the center may yield different molecules. Labelling of the
free valences of the central ring-superatom with radicals treated as labels
(supplemented with adequate number of hydrogens to make up the total
free valence of the ring-superatom) generates a complete and irredundant
list of molecules. Thus >C==C< is labelled with the label set:
.BEGIN NOFILL
one of -C≡≡CH , two of -CH3 , and one of -H.
.END
There are two unique labellings as shown in Figure 4.
C2. METHOD FOR ODD NUMBER OF TOTAL ATOMS
With an odd number of total atoms, no structures can be generated
with a central bond. Only the case of a central atom is to be
considered. However, it is possible for structures to be built with
a bivalent atom at the center. Thus the procedure outlined in Category
B above is followed, in this case also allowing a bivalent atom to
be the central atom.
C3. GENERATION OF RADICALS
The goal of this procedure is to generate all radicals from
a list of atoms and ring-superatoms. A radical is defined to be an atom
or superatom with a single free valence. When a composition of
atoms and ring-superatoms is presented, from which
radicals are to be generated, two special cases are recognized.
Case 1. Onle One Atom in Composition.
When only one atom which is not a ring-superatom makes up the composition
only one radical is possible. As for example, with one C, the
radical -CH3 is the only possibility.
Case 2. Only One Item (a Ring-superatom) in Composition.
In this case, depending upon the symmetry of the ring-superatom,
several radicals are possible. This is determined by labelling the
free valences of the ring-superatom with one label of a special type, a
"radical-valence".
.BEGIN NOFILL GROUP
Example: A composition consists of one ring-superatom, 16.
C-
/"
>C< "
\"
C-
16
.END
Two radicals result from labelling with one radical valence.
.BEGIN NOFILL GROUP
CH C-
/" /"
-CH< " CH2< "
\" \"
CH CH
17 18
.END
∪General ∪Case. Radicals have uniquely defined centers as well, the
center always being an atom of valence two or higher. The steps for
generation of radicals are as follows.
Step 1. Selection of Central Atom.
Any bivalent or higher valent atom or ring-superatom is a valid
candidate to be the center of a radical. Thus, for example, for the
composition -C≡C-, >C=C< (see part 1 in Figure 3) both are
valid central atoms (Figure 5).
Step 2. Partition the Rest of the Atoms.
The atom chosen for the central atom is removed from the composition.
One of the valences of the central atom is to remain free. Therefore, the
rest of the atoms in the composition are partitioned into less than
or equal to (valence of central atom - 1) parts. Of course, each part
should satisfy the restriction on univalents (equation 7) but
for constructing radicals there is no restriction on the size of the
parts.
Step 3. Form Radicals from Each Part.
The procedure to generate radicals is freshly invoked on each
part thus generating radicals. Each part in Figure 5 gives rise
to only one radical each arising from special case 2.
Step 4. Combine Radicals in Each Part.
For the example in Figure 5, each part yields only one
radical. In a more general situation, where the rest of the composition
after selection of center, is partitioned into several parts, and
where each part yields several radicals, the radicals are combined
to determine all unique combinations of radicals.
Step 5. Label Central Atom with Radicals.
If the center is an atom (not a ring-superatom) then each unique
combination defines a single unique radical.
If the center is a ring-superatom, the radicals are determined by
labelling the center with a set of labels which includes;I) the radicals;
ii) a leading radical-valence; iii) an adequate number of hydrogens
to make up the remaining free valences of the ring-superatom. One selection of
center gives one radical and the other gives two more, to complete
a list of three radicals for the example chosen (Fig. 5).
Summary
For the example chosen to illustrate the operation of the
acyclic generator, twelve isomers are generated, nine shown in Figure
3 and three shown in Figure 4.