perm filename NIH.PRO[1,LMM] blob
sn#077060 filedate 1973-12-07 generic text, type T, neo UTF8
DRAFT
NIH PROPOSAL
Buchanan, Smith
11/19/73
PREAMBLE
I. INTRODUCTION
A. Objectives
B. Background and Rationale
C. Relationship to SUMEX and the Genetics Research Center
II. SPECIFIC AIMS
III. METHODS
IV. SIGNIFICANCE OF PROPOSED RESEARCH
V. FACILITIES & EQUIPMENT
VI. ORGANIZATIONAL FRAMEWORK
VII. BIBLIOGRAPHY
PREAMBLE
This renewal application requests funds for continued support of
resource-related research and applications in the area of chemistry and
artificial intelligence. Its previous funding resulted in a
collaborative scientific effort, which has produced significant results.
Previous efforts were also subsidized by both the Advanced Research
Projects Agency (ARPA) in areas of the computer science research, and
by the National Aeronautics and Space Administration (NASA) for
instrumentation. These additional funds have been or will soon be
terminated: to continue this research, part of the burden of support
must be shifted in a way that more accurately reflects the true cost
of supporting our research. We have made every effort to reduce
the requested budget in ways which will not severely impact the
research and which reflect the most efficient use of existing
resources of talent, instrumentation and computer programs.
Termination of funding would prejudice the utilization of existing
resources and ongoing research in other areas of NIH interest (an
estimated $300,000 of mass spectrometry laboratory facilities alone).
Our project is the only systematic effort currently underway in this
country (to our knowledge) for computer assisted structure elucidation
(there is presently an intensive program underway in Japan in the
same area). This situation may be contrasted with computer assisted
organic synthesis, an area receiving considerable attention from
several research groups. Our efforts could not be begun again from
scratch without the expenditure of prohibitive amounts of money and
wasteful duplication of effort. These capabilities can be beneficially
provided to a wider community via the SUMEX resource. Research
involving the augmentation of human intellect by computer programs
may dramatically effect the ways in which chemical research is done
in the future.
The personnel associated with the project constitute a unique and
valuable resource. Over the past five years we have assembled at
Stanford an energetic team of scientists with experience in many
various aspects of computer science and chemical structure elucidation,
as well as an experienced and efficient technical support staff. This
intellectual resource is an important component of this proposal.
Without such capable personnel, the proposal would not be feasible.
Without the financial support from this proposal, this line of
collaborative research will have to be abandoned and the mass
spectrometry facility will have to be closed.
I. INTRODUCTION
Significant resources of instrumentation, computer programs and
people have been assembled through the support of various granting
agencies, including the NIH in its current grant for resource-related
research. The research proposed in this renewal application will
extend the capabilities of these resources and insure their operation
in the service of other research.
A. Objectives
In the past several years, this project has developed special facilities
and experience for molecular structure elucidation using artificial
intelligence (AI) programs and spectroscopic data derived primarily from
mass spectrometry (MS). This proposal requests support in order to:
1) Develop a combined gas chromatography/high resolution mass
spectrometry (GC/HRMS) system that is reliable enough to be used
routinely. When this system is developed, service will be available
to the Stanford community and research collaborators and, if our
resources permit, to any scientist requesting assistance.
2) Apply advanced artificial intelligence techniques to the
scientific inference problems of molecular structure elucidation and
theory formation from spectroscopic data.
3) Investigate mixtures of biologically important compounds, for
example, marine sterols, and compounds isolated from extracts of
human urine. High resolution mass spectrometry and combined gas
chromatography-high resolution mass spectrometry are excellent
structure elucidation techniques for these problems, especially in
conjunction with the artificial intelligence programs. Where
possible, additional information from other spectroscopic techniques
will also be used for structure elucidation.
-----------------------------------------------------------------------
* "High resolution" is a misnomer in the sense that the basic function
of a high resolution mass spectrometer is to provide the capability
for accurate mass determinations, so that elemental compositions can
be assigned to each ion. This capability can be achieved in some
cases even at "low" resolving powers.
----------------------------------------------------------------------
B. BACKGROUND AND RATIONALE
1. The Structure Elucidation Problem
a) The General Chemical Problem. Analysis of molecular
structure (as opposed to synthesis) is one of the major activities in
chemistry related research. For the specific task of elucidating
molecular structures, chemists utilize a mixture of information
derived from chemical procedures and spectroscopic techniques. Each
item of information, if not redundant or uninterpretable, contributes
to the solution of the problem. Chemists draw upon a tremendous body
of specific knowledge about chemistry, molecular structure,
spectroscopic techniques, etc., in order to piece together this
information and infer the structure of molecules. These features
make the problem particularly well-suited for applications of the
techniques of artificial intelligence to assist research workers
performing the task.
b) DjerassiS"s Laboratory. Professor Djerassi has been concerned
with structure elucidation problems since the beginning of his
chemical research. His activities at Stanford have been concerned
heavily with the application of particular spectroscopic techniques
to structural studies of biomedically important compounds. These
techniques include optical rotatory dispersion (ORD) and, more
recently, magnetic circular dichroism (MCD) (both of them supported
initially by the NIH). More recently he has been concerned with mass
spectrometry because of the power of the technique, in terms of
specificity and sensitivity, as an analytical tool for structure
elucidation.
Although the technique of mass spectrometry may not be sufficient for
all structure determination problems, it is a very powerful tool in
areas where there exists a body of knowledge about the behavior of
related molecules in the mass spectrometer. Also when sample size is
limited mass spectrometry may well be the only technique that can be
utilized. In both cases, the recent availability of high resolution
mass spectrometers has made HRMS the technique of choice because of the
greater specificity of empirical formulae rather than nominal masses for
each ion. On a parallel course, the technique of GC/MS, routinely
available with low resolution mass spectrometers (GC/LRMS), has
revolutionized investigations wherever complex mixtures are encountered.
All of the above considerations argue that an extension of mass
spectrometry at Stanford to provision of GC/HRMS on a routine basis
would be the next logical step toward more powerful structure
elucidation for researchers depending on this facility. This system,
applied to complex mixtures, will produce empirical formulae of all ions
in the spectra of the mixtures. It is also expected that the data from
mass spectrometry would provide the most powerful input in many cases to
the AI programs assisting in the analysis, prior to consideration of
other types of spectroscopic information.
2. Historical Background
a) Mass Spectrometry Laboratory. Prior to the existing DENDRAL
grant, the groundwork was laid for computerization of the existing
mass spectrometers, an Associated Electrical Industries MS-9 high
resolution mass spectrometer and an Atlas CH-4 low resolution mass
spectrometer. This work, supported primarily by NASA via the
Instrumentation Research Laboratory (IRL) in the Department of Genetics,
resulted in link-up to the then existing ACME computer facility via a
PDP-11 mini-computer which acted as a buffer between the spectrometers
and ACME. Initial data acquisition and reduction programs were
written for the system and utilized on a limited basis. The funding
of the DENDRAL proposal in conjunction with additional resources
provided by the IRL resulted in a major effort to upgrade these
capabilities and to link the new mass spectrometer to the system. The
fruits of these efforts are described under section II.B.3 (below).
B) Summary of Early DENDRAL Development.
In 1964, Lederberg devised a notational algorithm for chemical
structures (termed DENDRAL) that allowed questions of molecular
structure to be framed in precise graph-theoretic terms. He also
showed how to use the DENDRAL algorithm to generate complete and
irredundant lists of structural isomers.
In 1965-66 Lederberg and Feigenbaum began exploring the idea of
using the isomer generator in an artificial intelligence program -
searching the space of possible structures for plausible solutions
to a problem much as a chess-playing program searches the space of
legal moves for the best moves. This approach guarantees that every
possible solution to a problem is considered - either implicitly, as
when whole classes of unstable structures are rejected, or
explicitly, as when a complete molecule is tested for plausibility.
In either case, an investigator easily determines the criteria for
rejection and acceptance and knows that no possibilities have been
forgotten. This approach also guarantees that structures appear in
the list only once - that symmetric representations of the same
complex molecule have not been included. In both these respects the
computer program has an advantage over manual approaches to structure
elucidation.
C) Initial collaboration with Djerassi. Lederberg and Feigenbaum
realized that (a) only through application to real problems could the
worth of the AI approach be evaluated, and (b) mass spectrometry
appeared to be a fruitful applications area. Mass spectrometry appeared
to be an excellent problem area because of the close relationship
between spectral fragmentation patterns and molecular structure for many
classes of molecules. DjerassiS"s interest and expertise led to a series
of publications describing the approach and initial results of the
programs. The success of these collaborative efforts led to the
proposal to the NIH for initial funding to extend these efforts.
d) Efforts Under NIH Funding. The initial funding by NIH
provided the opportunity to upgrade the instrumentation and computer
programs. In particular we were able to mount a concerted project
on both the analysis of mass spectra and the mathematical aspects of
molecular structure. Progress reports to the NIH describe this
research in detail. The most recent annual report appears in
Appendix ***. A series of publications directed to audiences both in
computer science and chemistry are listed in bibliography Z. The
following section (Section 3) summarizes the capabilities for
structure elucidation which, in themselves, constitute an important
result of past work.
An important side effect of the DENDRAL project is the extent to which
additional research was inspired and carried out to fill gaps in
existing knowledge. This research, not supported by the DENDRAL grant,
has been beneficial to on-going DENDRAL work, and vice-versa.
Publications which have arisen from this research are listed in
bibliography. A brief review of these publications should indicate
the need for precise specification of the knowledge elicited from
chemists and used in computer programs. As an example, consider the
description and application of an early algorithm for generation of
cyclic structural isomers (Y.M. Sheikh, et.al., 1970). This paper
considered the problem of spectroscopic differentiation of isomers of
C6H10O. Unsaturated ethers are one of the classes of isomeric compounds
which must be considered, but the mass spectrometry of unsaturated
ethers had not been investigated systematically. This work was
subsequently carried out in Professor DjerassiS"s laboratory independent
of DENDRAL support, but of benefit to DENDRAL (Morizur and Djerassi,
1971). Other examples will be found in Bibliography.
3. Existing Capabilities
This research team has already developed unusual capabilities for
chemical structure elucidation, bringing together a high quality
HRMS system and AI programs applied to chemistry. We have demonstrated
the feasibility of our analytical approach in several problem areas,
and have developed both a mass spectrometry system and a general set
of computer programs for use in new areas.
The most outstanding capabilities are summarized below, followed by
brief discussions of each. These are available immediately, and
were developed primarily under NIH funding to this project, with
additional support supplied by ARPA and NASA in specific areas. (These
agencies have reduced funding levels for this work, however, leaving
the NIH as the source of support for future development of applications
programs in the area of artificial intelligence and chemistry.)
a. High Resolution Mass Spectrometry System and Coupled Gas
Chromatography/Low Resolution Mass Spectrometry System. We have coupled
the NIH-supported Varian-MAT 711 High Resolution Mass Spectrometer with
a Hewlett Packard Gas Chromatograph and demonstrated its utility.
Advanced data reduction techniques for this instrument exist in the
dedicated PDP 11/20 and StanfordS"s 370/158.
b. DENDRAL Structure Generator
The DENDRAL Structure Generator is a unique computer program capable of
exhaustive and irredundant generation of isomers, with and without
rings. This program is the "legal move generator" that guarantees
consideration of every candidate structure - either implicitly, as when
whole classes of structures are forbidden, or explicitly, as when
individual compounds in a class are specified. A labelling algorithm,
which is essential to structure generation, is capable of producing
answers to many structural questions. For example, it can list all
structures resulting from substituting a carbo-cyclic skeleton with some
numbers of different groups.
c. DENDRAL Planner
We have written a set of computer programs for determining structural
features from analytic data in well-defined areas. Such planning
programs have been written for low and high resolution mass
spectrometry, interpreted proton NMR spectroscopy and 13CMR data.
d. INTSUM
INTSUM is a computer program that aids in finding interpretive rules for
mass spectrometry. The program interprets a large collection of mass
spectrometry data according to criteria specified by a chemist. Then it
summarizes the data to show which of the possible interpretations seem
most plausible.
e. Ancillary Techniques
1. The mass spectrometry facility provides other types of experiments
in mass spectrometry, including ultra-high resolution measurements
(masses determined via peak matching), metastable ion determinations
(Barber-Elliott technique) and low ionizing voltage experiments.
These data are utilized by both chemists and programs where appropriate.
2. Additional computer programs provide added problem-solving assistance.
a. Predictor program for predicting major features of mass spectra.
b. Programs for drawing and displaying chemical structures.
c. Subroutines developed in conjunction with or existing as parts of
the Structure Generator for problems of partitioning, construction of
vertex-graphs, and constructive graph labelling. These can be applied
to answer certain questions of isomerism which do not require the
complete generator. For example, the labelling algorithm can list
all structures resulting from substituting a carbocyclic skeleton with
some numbers of different functional groups.
f. Other Spectroscopic Techniques
Available to us are the spectroscopic facilities of Professor DjerassiS"s
laboratory for work requiring additional spectroscopic data. Also
available on a fee for service basis are the extensive spectroscopic
facilities of the chemistry department. These would be utilized for
collecting of additional data on particular structure problems and
gathering data on known compounds (particularly in the area of
13CMR) as the AI programs become knowledgable about other spectroscopic
information.
g. Chemical Facilities
We possess, in Professor DjerassiS"s chemical laboratories,
substantial synthetic capabilities and general chemical know-how.
This resource can be called upon to provide assistance in synthesis
of model or labelled compounds, derivitization of mixtures, and so
forth. As an example of how extensive use of these facilities has
been accomplished in the past, a graduate student is presently
engaged in thesis research dealing with synthesis of a new estrogen
metabolite strongly suspected to be a component of certain pregnancy
urines.
4. User Community.
We feel that the maximum economic utilization of existing facilities,
and those proposed, can be realized by sharing them with a community of
users. Without additional funds for a major service facility, this
community will emphasize the following groups, but will be informally
available to others.
A. Stanford Community
i) Stanford Chemistry Department
Djerassi - Steroids, marine sterols
(list after questionnaire return)
ii) Stanford Medical School Collaborators
(list after questionnaire return)
B. Extramural Users.
The development of the techniques of ORD, MS and MCD at Stanford has
been paralleled with extensive sharing of these resources nation- and
world-wide in collaborative research efforts, without any additional
funding. Rather than provide simple service, experience has shown that
use of some discretion in selection of problems results in better
utilization of the people and instrumentation involved. We would extend
this provision of services to include available computer programs where
appropriate along the lines of our successful collaboration with
Professor Adlercreutz, University of Helsinki.
II. SPECIFIC AIMS
1. Develop routine GC/HRMS techniques of utility to
biomedical scientists with structure elucidation problems. Prototype
GC/HRMS systems have been developed at Stanford and elsewhere, but this
type of facility (in contrast to GC/LRMS) does not seem to be routinely
available. Although we wish to give our computer programs (see Aim 2)
the flexibility to deal with other analytic data, our own efforts on
instrumentation will be centered on GC/HRMS, for reasons explained in
Section (I).
2. Develop new computer programs, and improve existing ones, for
assisting analytical chemists with structure elucidation problems and
theory formation. Computer programs have already been written for
analysis of low and high resolution mass spectra for generation of
acyclic and cyclic molecular structures, for labelling structural
skeletons with atoms, for analyzing C13 NMR spectra of amines and for
interpretation and summary of large volumes of data gathered on model
compounds. We wish to increase the utility of these routines by
providing interactive programs that allow easier access to the programs,
by increasing their generality and power, and by supplementing them with
new reasoning programs.
3. Apply the structure elucidation techniques - both
instrumentation and computer programs - to biomedically relevant
compounds.
III. METHODS
Chemical structure elucidation requires the intelligent and patient
application of a large body of chemical knowledge to each specific
problem. Because of the importance and relative difficulty of the
problem, we believe computer programs can provide powerful assistance
to chemists in their analyses. It is unlikely that such programs will
ever replace chemists, in part because computer programs are written
to focus on rather narrow aspects of problems. But it is reasonable
to view our past research as a demonstration of the computerS"s ability
to assist chemists although this was a spinoff from theoretically
oriented research. We wish to stress that our present aim is to
provide assistance for structure elucidation problems.
In order to meet the major objectives of this proposal we will focus
our attention primarily on structure elucidation through mass
spectrometry and artificial intelligence. However, many of the
computer programs can already use information from other analytical
techniques. So we want to be able to think of structure elucidation
in the context of an ensemble of analytic capabilities.
The specific aims enumerated in Section (II) will be pursued in the
highly inter-disciplinary manner that has characterized the DENDRAL
project under NIH support. The aims are not separate aims at all,
but are interactive and dependent upon each other. For example, we
feel that the power of mass spectrometry and, potentially, other
spectroscopic techniques, can be enhanced by the use of computer
programs to perform various aspects of structure elucidation. From
the standpoint of computer science, one measure of the utility of
techniques of artificial intelligence is how well they perform in
real-world applications. We have focused our interest on AI
programs for structure elucidation and the related area of theory
formation, primarily in mass spectrometry. It is necessary in the
development of these programs to have a source of data and personnel
able to criticize methods and results.
A) Development of Routine GC/HRMS Facility
We have developed a significant resource consisting of
instrumentation (the Varian MAT-711 and ancillary equipment) and
computer programs for instrument evaluation and data acquisition and
reduction. Routine reduction of high resolution mass spectra to
elemental compositions and ion abundances without human intervention
provides the capability for efficient handling of large volumes of high
resolution mass spectra. The development of the gas chromatography
and of the GC/MS combination is in the excellent hands of Ms. Wegmann,
formerly head of Hewlett-PackardS"s gas chromatography applications
laboratory. She is responsible for operation of the complete system.
We now have more than two years of operational experience with the
mass spectrometer, the gas chromatograph and related equipment under
a wide variety of experimental conditions.
The biomedical community (see User Community) desiring access to our
facilities for structure elucidation have a variety of problems, some
of which can be solved by existing instrumentation and computer
techniques. However, many problems consist of complex mixtures of
compounds where analysis by conventional GC/LRMS does not lead to
unambiguous solutions and separation of components on a preparative
scale for other spectroscopic analysis is difficult. These problems
are amenable to attack by a system comprised of a GC/HRMS combination,
the GC, providing separation, coupled with the mass spectrometer
operating at high resolution to provide highly specific information.
Thus, upgrading of our current system so that GC/HRMS data can be
provided on a routine basis is a desireable, and we believe necessary,
step to solve many of these problems.
We were able to perform some preliminary experiments to evaluate the
feasibility of operating a GC/MS system at high mass spectrometer
resolving powers. These experiments were hampered somewhat by the
limitations of the computer system used to acquire the data (only
occasional, single scans were possible) and were discontinued as was
all HRMS operation on the termination of the ACME computer facility.
We do have, however, some benchmark figures on which to evaluate the
level of performance of the proposed system. We were able to obtain
good quality mass measurements over a dynamic range of 100:1 for
sample sizes of the order of 0.5-1.0 micrograms/component during 8
sec/decade in mass scans (resolving powers 5-8,000).
We propose to operate our existing GC/MS system under high resolution
conditions aiming toward optimization of resolving powers, scan rates
and GC and molecular separator operating conditions to determine the
maximum usable sensitivity of the system.
We recognize that the ultimate sensitivity will not approach that
attainable by photographic methods of recording; we feel that the
ability for real-time operation and evaluation of the operating
conditions of the mass spectrometer partially offsets the sensitivity
disadvantages. We realize that some structure elucidation problems
will not be amenable to study because of the sensitivity limitations;
we feel, however, that many problems of interest to the User
Community can be studied effectively with this performance capability.
Rather than propose a research program to increase the sensitivity of
high resolution mass spectrometers (e.g., McLafferty, et.al., dynamic
rescanning of peaks; JPL - photon emission/detector arrays), we
propose to identify our limitations and, with our collaborators, use
discretion in selecting and preparing samples.
Any HRMS system requires computer support; our proposed GC/HRMS
facility requires a significant amount of support to process mass
spectral data in a reasonable length of time. There are several
options which might be pursued to obtain this support. They are
described in detail in the accompanying budget justification.
B) Computer Assisted Structure Elucidation
As mentioned in Section A above, the Planner program can be used
immediately for structure elucidation problems using mass spectrometry
data. The program has been described in detail elsewhere and is
mentioned in the section on existing capabilities. Its performance is
excellent precisely in the areas where mass spectrometry, by itself, is
capable of definitive structure analysis. Where the history of a sample
is known, so that potential classes of compounds are restricted, and
where the rules of mass spectrometric fragmentation are known in detail
for the classes, the performance of both chemists and the program are
excellent, although the program offers some advantages in its exhaustive
and rapid analysis of the data. Many structure elucidation problems of
the user community fit into this category and existing resources can
fulfill these needs.
Mass spectrometry cannot solve all structure elucidation problems,
however. In such cases, the chemist turns to other spectroscopic
techniques if sample size permits. As described in the introductory
section, the chemist pieces diverse information together to achieve a
solution. Interactive computer programs can assist the chemist in
this procedure, with the advantages of exhaustive evaluation of the
data and the molecular structures suggested by these data.
In our own work and in that work planned with colloborators we can
call upon the extensive facilities of the chemistry department for
acquisition of additional spectroscopic data to assist in the
application of the software systems to real problems. These
facilities are on a fee for service basis, such fees presumably being
paid from existing research grants of the user community. There are
sufficient literature examples of structure elucidation problems to
obviate the requirement for extensive use of these additional
facilities in development of the programs.
We propose to develop these software facilities in the following way:
1) The recently completed structure generator will be the core of our
efforts to assist chemists in structure elucidation. The structure
generator can guarantee that the correct solution is somewhere in the
list of possibilities. Additional programs, such as the Planner allow
us to avoid exhaustive generation in practice. Some parts of this
program have not been extensively tested yet, and these tests will be
the first task to complete.
2) The SUMEX resource will provide the capability for development
of an interactive system and also provide the mechanism by which
others can gain access to the programs as they are developed.
3) The structure elucidation task as carried out by chemists is
strongly directed toward rejection of whole categories (e.g.,
compound classes) of solutions as quickly as possible by using as
much knowledge of chemical history or characteristics as is available.
Details of spectroscopic data are then used to define the molecular
framework more precisely. Each step in this procedure represents the
application of constraints on the set of possible solutions which
must be considered. Computational efficiency demands that these
constraints be applied early in the generation process when the
structure generator is utilized.
We have made some effort to examine the kinds of constraints used by
chemists engaged in structure elucidation. We have begun designing
strategies so that these constraints can be brought to bear on the
structure generator. Some of these strategies involve minor changes
to the existing program; others require significant extensions of
existing generating functions. In either case, we propose to continue
these investigations so that a reasonable variety of constraints can
be recognized and utilized effectively by a computer program. This
represents the first steps toward increasing the chemical knowledge
of a program which views chemical structures and their manipulation
as mathematical entities and transforms.
4) Present, effective use of the structure generator or its subroutines
for special problems requires a detailed knowledge of the program. We
propose to develop an interface between chemists and the program to
remove this requirement. The interface would contain elements of
structure input and display routines and a simple language for
application of constraints. Portions of these elements are available
from other workers (e.g., Richard Feldman, NIH) and we would draw on
these sources whenever possible.
5) We propose that initial efforts will be directed toward a system
where the chemist examines his own data and inputs his findings (in
terms of allowed and disallowed structural features) to the program
as constraints. The generator would then provide a list of possible
solutions which can be evaluated by the chemist, who can then iterate
on this procedure.
6) There are three feasible, but longer term, extensions of this
approach which we feel are potentially very valuable. We propose
to begin at least preliminary investigations on the following:
a) We have the capability now for automatic interpretation of
mass spectral data. Results of this interpretation can be applied
directly to a structure generator. Similar Planners could be written
for automatic analysis of data from other spectroscopic techniques, as
we have illustrated for 13CMR.
b) A program with detailed knowledge about information
obtainable from various spectroscopic techniques could examine a list
of candidate solutions and propose experiments necessary and
sufficient to distinguish among them. We have illustrated this
capability previously in the area of mass spectrometry using the
Predictor.
c) The structure generatorS"s view of chemistry is two
dimensional and presently unconstrained by such ideas as bond lengths
and angles, steric hindrance, and so forth. Because stereochemical
considerations are extremely important in structure elucidation, we
propose to begin consideration of stereochemistry in the structure
generation process. Lederberg has previously discussed ways in which
three diminsional information can be considered in the generation and
representation of molecular structures. More recently, the work of
Wipke in connection with computer assisted organic synthesis has
provided important results which we would attempt to utilize to
avoid unnecessary duplication of effort.
C. Theory Formation
One important aim of this work is to improve the existing theory
formation capabilities and thus provide more assistance to scientists
investigating regularities within classes of compounds. This is a
theory formation task at a very pragmatic level. The mass spectrometry
theory that the program attempts to find is of the same form as the one
practicing mass spectroscopists use for structure elucidation. Thus,
resulting pieces of theory are extensions to both the scientistsS" theory
and the computerS"s theory of the discipline. To improve this program we
need to complete the Plan-Generate-Test program that has been started
(as described in the appended annual report) and tune it over many test
cases. We also wish to make the programs interactive and easy to use so
that they are more readily accessible. This can be done when the
programs are transferred to the SUMEX facility.
We plan to apply the theory formation program to two different kinds
of data: (a) the data collected in the interest of understanding
the mass spectrometry of a particular class of compounds, such as
estrogenic steroids, and (b) collections of diverse data that may
provide some insight into more general fragmentation mechanisms. For
example, by studying the mass spectra of monofunctional compounds we
would hope to find rules that lead to a better understanding of more
complex compounds.
The INTSUM program mentioned in Section (I) is the planning phase of
the theory formation program. It currently runs in batch mode on
StanfordS"s 360/67 computer. We wish to add an interactive monitor
to INTSUM to give an investigator the ability to set up his own
conditions for interpreting the mass spectra and to control the type
of summary he wishes to see. For example, if he is interested in the
allowable hydrogen transfers associated with one specific process the
program could be instructed to produce a very specific summary. Also,
we wish to add an interactive program for answering questions about
the results. For example, an investigator should be able to find out
easily how many processes involve cleavage of a specific bond and how
strong their resulting mass spectral peaks are.
The INTSUM program is now used routinely by chemists engaged in
investigations of the mass spectrometric fragmentation of various
classes of organic compounds, primarily steroids. A manuscript is now
in preparation (Hammerum and Djerassi, 1973) describing the
fragmentation of progesterone and related compounds. The program was
used extensively in this work. We are now beginning a detailed
examination of the fragmentation of steroids related to the androstane
skeleton, particularly testosterones. We propose to continue to use
the INTSUM program in its present form and as it is improved in
support of these studies.
The generator of rules that we now have does a credible job of
explaining the regularities summarized by INTSUM. It has found, for
example, the well-known alpha-cleavage fragmentation process and beta
cleavage followed by rearrangement in the low resolution data for
fifteen aliphatic amines. The program will be extended in two important
ways to increase its utility: (i) the program needs to be able to work
with an increased number of descriptive predicates in the generation of
rules, and (ii) it needs to be given a more flexible representation of
complex fragmentation mechanisms so that it can find rules involving
more than two bonds.
It will also be desirable to provide interactive programs for the
investigator to query the rule generation program. For example, many
questions now arise about the programS"s inference steps to the rules
it suggests as explanations of the regularities. Why, for example,
was some particular rule not considered plausible?
The test phase of the theory formation program remains to be written.
It will verify the rules by testing them against new data - preferably
against results of carefully selected new experiments. It will
modify or delete rules on the basis of counter examples. It will
also have to design so-called "crucial experiments" that allow
differentiation among competing rules.
D) Applications to Biomedically Relevant Compounds
We can immediately offer to the user community the Planner, for
analysis of high resolution mass spectra in terms of molecular
structure. The program is, of course, insensitive to the source of
the mass spectral data, and we foresee significant use of the program
for analysis of spectra from the GC/HRMS facility without additional
programming effort.
We foresee that the GC/HRMS facility will be used in studies of the
following nature:
1) Djerassi - marine sterols - isolation and characterization of
mixtures of sterols from marine organisms. This research is supported
by the NIH. This work is presently carried out by GC/LRMS techniques
and isolation of milligram quantities of individual sterols by TLC or
GC for further characterization. Although high resolution mass
spectra alone may not be sufficient for structural characterization,
the extra information may be crucial, particularly for minor
components where isolation of larger quantities of material is
difficult or impossible. If larger amounts of material are available,
the proposed computer program development (part B, below) will also
be of assistance in analysis of additional spectroscopic data. These
arguments, of course, hold true for other areas of interest outlined
below.
2) Djerassi - hormonal steroids - we plan on continued collaboration
with Professor Adlercreutz on analysis of estrogen mixtures; GC/HRMS
might be very effective in future collaboration with Adlercreutz in a
variety of related areas. Present work on improvement of mass
spectrometry theory for other classes of steroids (to be used in the
Planner) is being carried out by postdoctoral fellows who provide
their own financial support.
3) Genetics Center - the screening activities of the Genetics Center
research use exclusively GC/LRMS techniques. Past experience has
shown the difficulties in identification of unknown components in
extracts of human urine by LRMS data alone. We plan, through
collaborative efforts, to provide necessary GC/HRMS data where
required, and to use our computer programs to assist in these studies.
4) and following - questionnaire results
***
IV. SIGNIFICANCE OF PROPOSED RESEARCH
Structure elucidation is an important and difficult problem for
scientists in a biomedical community. This research aims at providing
more powerful techniques for determining molecular structures than are
now routinely available. In particular, we have proposed (a) developing
routine GC/HRMS instrumentation as a means of collecting powerful
analytic data for scientists; (b) developing (and extending)
sophisticated computer programs to assist with the interpretation of the
data from mass spectrometry and elsewhere, (c) developing (and
extending) novel computer programs to assist with formulation of the
rules of interpretation, and (d) applying these state of the art
techniques to problems of biomedical relevance. No other research group
can claim such a broad-based attack on the problems of structure
elucidation.
The proposed research not only holds promise for significant long-term
advances, it can have immediate benefits as well. Many members of the
biomedical community at Stanford have called upon the mass spectrometry
laboratory for assistance in the past and will continue to do so in the
future. HRMS is an important source of data for these problems, and
GC/HRMS is still more important. Previous investment by the NIH in the
Varian MAT-711 HRMS system at Stanford can be utilized now and built
upon for the future. Continued operation of the mass spectrometer will
give the Stanford community access to state-of-the-art spectroscopic
techniques and to professional mass spectroscopists who can help with
ongoing problems.
The computer programs themselves constitute a unique resource for
assisting with the structure determination. The previous NIH grant
supported development of the programs. In part, we are requesting
funds to exploit these programs.
One of the most significant aspects of this work is its
interdisciplinary view of solving chemical structure problems by
searching the space of chemical graph structures. As a result of posing
the structure determination problem in this framework, we have been able
to further the systematization of chemistry in at least three ways.
First, the knowledge of chemistry used by analytic chemists has been
made more precise for use in a computer program. Second, codifying such
knowledge for the computer has led to the discovery of new research
areas to extend our existing knowledge of chemistry. Several
publications listed in the bibliography (Refs. 42 and following) are
reports of exactly this kind of research. Finally, the computerS"s
search through the space of possible structures gives the practicing
scientist the confidence that no structures were merely overlooked -
many whole classes may not have been explicitly enumerated, but that is
because the computer rejected those classes using precise criteria that
are themselves explicit. For this last reason, the computer program has
the potential to augment a chemistS"s reasoning power in a way that has
never before been possible.
V. FACILITIES & EQUIPMENT
The Stanford Mass Spectrometry Laboratory will provide GC/HRMS on
the Varian MAT-711 mass spectrometer coupled with a Hewlett-Packard
gas chromatograph. As service instruments for more routine mass
spectral analyses, the laboratory has a MS-9 and CH-4 mass
spectrometers. A PDP 11/20 computer with one disk drive currently
provides the only dedicated data-reduction capability in the
laboratory.
Data reduction beyond the minimal capability of the PDP 11/20 can be
provided on StanfordS"s IBM 370/158 computer. (The PDP 11/30 presently has
only the capability for buffering peak profile data between the mass
spectrometer and the IBM 370/158 computer at the Stanford Computer
Center.) An alternative to buying time on the 370/158 is proposed and
discussed in the budget justification.
The artificial intelligence programs will be run on the NIH-sponsored
SUMEX computer facility (a PDP-10 computer with the TENEX operating
system, 192K words of memory, and more than adequate peripherals).
Running these programs on SUMEX will incur no charge.
DENDRAL PUBLICATIONS
(1) J. Lederberg, "DENDRAL-64 - A System for Computer
Construction, Enumeration and Notation of Organic Molecules as
Tree Structures and Cyclic Graphs", (technical reports to NASA,
also available from the author and summarized in (12)).
(1a) Part I. Notational algorithm for tree
structures (1964) CR.57029
(1b) Part II. Topology of cyclic graphs (1965) CR.68898
(1c) Part III. Complete chemical graphs;
embedding rings in trees (1969)
(2) J. Lederberg, "Computation of Molecular Formulas for Mass
Spectrometry", Holden-Day, Inc. (1964).
(3) J. Lederberg, "Topological Mapping of Organic Molecules",
Proc. Nat. Acad. Sci., 53:1, January 1965, pp. 134-139.
(4) J. Lederberg, "Systematics of organic molecules, graph
topology and Hamilton circuits. A general outline of the DENDRAL
system." NASA CR-48899 (1965)
(5) J. Lederberg, "Hamilton Circuits of Convex Trivalent
Polyhedra (up to 18 vertices), Am. Math. Monthly, May 1967.
(6) G. L. Sutherland, "DENDRAL - A Computer Program for
Generating and Filtering Chemical Structures", Stanford Artificial
Intelligence Project Memo No. 49, February 1967.
(7) J. Lederberg and E. A. Feigenbaum, "Mechanization of
Inductive Inference in Organic Chemistry", in B. Kleinmuntz (ed)
Formal Representations for Human Judgment, (Wiley, 1968) (also
Stanford Artificial Intelligence Project Memo No. 54, August
1967).
(8) J. Lederberg, "Online computation of molecular formulas from
mass number." NASA CR-94977 (1968)
(9) E. A. Feigenbaum and B. G. Buchanan, "Heuristic DENDRAL: A Program
for Generating Explanatory Hypotheses in Organic Chemistry", in
Proceedings, Hawaii International Conference on System Sciences,
B. K. Kinariwala and F. F. Kuo (eds), University of Hawaii Press,
1968.
(10) B. G. Buchanan, G. L. Sutherland, and E. A. Feigenbaum,
"Heuristic DENDRAL: A Program for Generating Explanatory
Hypotheses in Organic Chemistry". In Machine Intelligence 4 (B.
Meltzer and D. Michie, eds) Edinburgh University Press (1969),
(also Stanford Artificial Intelligence Project Memo No. 62, July
1968).
(11) E. A. Feigenbaum, "Artificial Intelligence: Themes in the
Second Decade". In Final Supplement to Proceedings of the IFIP68
International Congress, Edinburgh, August 1968 (also Stanford
Artificial Intelligence Project Memo No. 67, August 1968).
(12) J. Lederberg, "Topology of Molecules", in The Mathematical
Sciences - A Collection of Essays, (ed.) Committee on Support of
Research in the Mathematical Sciences (COSRIMS), National Academy
of Sciences - National Research Council, M.I.T. Press, (1969), pp.
37-51.
(13) G. Sutherland, "Heuristic DENDRAL: A Family of LISP
Programs", to appear in D. Bobrow (ed), LISP Applications (also
Stanford Artificial Intelligence Project Memo No. 80, March 1969).
(14) J. Lederberg, G. L. Sutherland, B. G. Buchanan, E. A.
Feigenbaum, A. V. Robertson, A. M. Duffield, and C. Djerassi,
"Applications of Artificial Intelligence for Chemical Inference I.
The Number of Possible Organic Compounds: Acyclic Structures
Containing C, H, O and N". Journal of the American Chemical
Society, 91:11 (May 21, 1969).
(15) A. M. Duffield, A. V. Robertson, C. Djerassi, B. G.
Buchanan, G. L. Sutherland, E. A. Feigenbaum, and J. Lederberg,
"Application of Artificial Intelligence for Chemical Inference II.
Interpretation of Low Resolution Mass Spectra of Ketones".
Journal of the American Chemical Society, 91:11 (May 21, 1969).
(16) B. G. Buchanan, G. L. Sutherland, E. A. Feigenbaum, "Toward
an Understanding of Information Processes of Scientific Inference
in the Context of Organic Chemistry", in Machine Intelligence 5,
(B. Meltzer and D. Michie, eds) Edinburgh University Press
(1970), (also Stanford Artificial Intelligence Project Memo No.
99, September 1969).
(17) J. Lederberg, G. L. Sutherland, B. G. Buchanan, and E. A.
Feigenbaum, "A Heuristic Program for Solving a Scientific
Inference Problem: Summary of Motivation and Implementation",
Stanford Artificial Intelligence Project Memo No. 104, November
1969.
(18) C. W. Churchman and B. G. Buchanan, "On the Design of
Inductive Systems: Some Philosophical Problems". British Journal
for the Philosophy of Science, 20 (1969), pp. 311-323.
(19) G. Schroll, A. M. Duffield, C. Djerassi, B. G. Buchanan, G.
L. Sutherland, E. A. Feigenbaum, and J. Lederberg, "Application
of Artificial Intelligence for Chemical Inference III. Aliphatic
Ethers Diagnosed by Their Low Resolution Mass Spectra and NMR
Data". Journal of the American Chemical Society, 91:26 (December
17, 1969).
(20) A. Buchs, A. M. Duffield, G. Schroll, C. Djerassi, A. B.
Delfino, B. G. Buchanan, G. L. Sutherland, E. A. Feigenbaum, and
J. Lederberg, "Applications of Artificial Intelligence For
Chemical Inference. IV. Saturated Amines Diagnosed by Their Low
Resolution Mass Spectra and Nuclear Magnetic Resonance Spectra",
Journal of the American Chemical Society, 92, 6831 (1970).
(21) Y.M. Sheikh, A. Buchs, A.B. Delfino, G. Schroll, A.M.
Duffield, C. Djerassi, B.G. Buchanan, G.L. Sutherland, E.A.
Feigenbaum and J. Lederberg, "Applications of Artificial
Intelligence for Chemical Inference V. An Approach to the
Computer Generation of Cyclic Structures. Differentiation
Between All the Possible Isomeric Ketones of Composition
C6H10O", Organic Mass Spectrometry, 4, 493 (1970).
(22) A. Buchs, A.B. Delfino, A.M. Duffield, C. Djerassi,
B.G. Buchanan, E.A. Feigenbaum and J. Lederberg, "Applications
of Artificial Intelligence for Chemical Inference VI. Approach
to a General Method of Interpreting Low Resolution Mass Spectra
with a Computer", Chem. Acta Helvetica, 53, 1394 (1970).
(23) E.A. Feigenbaum, B.G. Buchanan, and J. Lederberg, "On Generality
and Problem Solving: A Case Study Using the DENDRAL Program". In
Machine Intelligence 6 (B. Meltzer and D. Michie, eds.) Edinburgh
University Press (1971). (Also Stanford Artificial Intelligence
Project Memo No. 131.)
(24) A. Buchs, A.B. Delfino, C. Djerassi, A.M. Duffield, B.G. Buchanan,
E.A. Feigenbaum, J. Lederberg, G. Schroll, and G.L. Sutherland, "The
Application of Artificial Intelligence in the Interpretation of Low-
Resolution Mass Spectra", Advances in Mass Spectrometry, 5, 314.
(25) B.G. Buchanan and J. Lederberg, "The Heuristic DENDRAL Program
for Explaining Empirical Data". In proceedings of the IFIP Congress 71,
Ljubljana, Yugoslavia (1971). (Also Stanford Artificial Intelligence
Project Memo No. 141.)
(26) B.G. Buchanan, E.A. Feigenbaum, and J. Lederberg, "A Heuristic
Programming Study of Theory Formation in Science." In proceedings of
the Second International Joint Conference on Artificial Intelligence,
Imperial College, London (September, 1971). (Also Stanford Artificial
Intelligence Project Memo No. 145.)
(27) Buchanan, B. G., Duffield, A.M., Robertson, A.V., "An Application
of Artificial Intelligence to the Interpretation of Mass Spectra",
Mass Spectrometry Techniques and Appliances, Edited by George
W. A. Milne, John Wiley & Sons, Inc., 1971, p. 121-77.
(28) D.H. Smith, B.G. Buchanan, R.S. Engelmore, A.M. Duffield, A. Yeo,
E.A. Feigenbaum, J. Lederberg, and C. Djerassi, "Applications of
Artificial Intelligence for Chemical Inference VIII. An approach to
the Computer Interpretation of the High Resolution Mass Spectra of
Complex Molecules. Structure Elucidation of Estrogenic Steroids",
Journal of the American Chemical Society, 94, 5962-5971 (1972).
(29) B.G. Buchanan, E.A. Feigenbaum, and N.S. Sridharan, "Heuristic
Theory Formation: Data Interpretation and Rule Formation". In
Machine Intelligence 7, Edinburgh University Press (1972).
(30) Lederberg, J., "Rapid Calculation of Molecular Formulas from
Mass Values". Jnl. of Chemical Education, 49, 613 (1972).
(31) Brown, H., Masinter L., Hjelmeland, L., "Constructive Graph
Labeling Using Double Cosets". Discrete Mathematics (in press).
(Also Computer Science Memo 318, 1972).
(32) B. G. Buchanan, Review of Hubert DreyfusS" "What Computers CanS"t
Do: A Critique of Artificial Reason", Computing Reviews (January,
1973). (Also Stanford Artificial Intelligence Project Memo No. 181)
(33) D. H. Smith, B. G. Buchanan, R. S. Engelmore, H. Aldercreutz and
C. Djerassi, "Applications of Artificial Intelligence for Chemical
Inference IX. Analysis of Mixtures Without Prior Separation as
Illustrated for Estrogens". Journal of the American Chemical Society
95, 6078 (1973).
(34) D. H. Smith, B. G. Buchanan, W. C. White, E. A. Feigenbaum,
C. Djerassi and J. Lederberg, "Applications of Artificial Intelligence
for Chemical Inference X. Intsum. A Data Interpretation Program as
Applied to the Collected Mass Spectra of Estrogenic Steroids".
Tetrahedron, 29, 3117 (1973).
(35) B. G. Buchanan and N. S. Sridharan, "Rule Formation on
Non-Homogeneous Classes of Objects". In proceedings of the Third
International Joint Conference on Artificial Intelligence (Stanford,
California, August, 1973). (Also Stanford Artificial Intelligence
Project Memo No. 215.)
(36) D. Michie and B.G. Buchanan, "Current Status of the Heuristic
DENDRAL Program for Applying Artificial Intelligence to the
Interpretation of Mass Spectra". August, 1973.
(37) H. Brown and L. Masinter, "An Algorithm for the Construction
of the Graphs of Organic Molecules", Discrete Mathematics (in press).
Also Stanford Computer Science Department Memo STAN-CS-73-361,
May, 1973)
(38) D.H. Smith, L.M. Masinter and N.S. Sridharan, "Heuristic
DENDRAL: Analysis of Molecular Structure," Proceedings of the
NATO/CNA Advanced Study Institute on Computer Representation and
Manipulation of Chemical Information, in press.
(39) R. Carhart and C. Djerassi, "Applications of Artificial
Intelligence for Chemical Inference XI: The Analysis of C13 NMR Data
for Structure Elucidation of Acyclic Amines", J. Chem. Soc. (Perkin II),
1753 (1973).
(40) L. Masinter, N. Sridharan, H. Brown and D.H. Smith, "Applications
of Artificial Intelligence for Chemical Inference XII: Exhaustive
Generation of Cyclic and Acyclic Isomers.", submitted to Journal of
the American Chemical Society.
(41) L. Masinter, N. Sridharan, H. Brown and D.H. Smith, "Applications
of Artificial Intelligence for Chemical Inference XIII: An Algorithm
for Labelling Chemical Graphs", submitted to Journal of the American
Chemical Society.
Publications Describing DENDRAL-Related Research But Not Funded By
This Grant
Mass Spectrometry in Structural and Stereochemical Problems CLXXXIII.
A Study of the Electron Impact Induced Fragmentation of Aliphatic
Aldehydes. J. Amer. Chem. Soc., 91, 6814 (1969). By R.J. Liedtke and
C. Djerassi.
Mass Spectrometry in Structural and Stereochemical Problems - CXCVII.
Electron-Impact Induced Functional Group Interaction in
4-Benzyloxycyclohexyl Trimethylsilyl Ether. Org. Mass Spectrom. y
4, 257 (1970). By Paul D. Woodgate, Robin T. Gray and Carl Djerassi.
Mass Spectrometry in Structural and Stereochemical Problems - CXCVIII.
A study of the Fragmentation Processes of Some a,B-Unsaturated
Aliphatic Ketones. Org. Mass Spectrom., 4, 273 (1970). By
Younus M. Sheikh, A.M. Duffield and Carl Djerassi.
Mass Spectrometry in Structural and Stereochemical Problems CCII.
Interaction of Remote Functional Groups in Acyclic Systems upon
Electron Impact. J. Org. Chem., 36, 1796 (1971). By M. Sheehan,
R.J. Spangler, M. Ikeda and C. Djerassi.
Mass Spectrometry in Structural and Stereochemical Problems CCVII.
Fragmentation of Unsaturated Ethers. Org. Mass Spectrom., 5, 895 (1971).
By J. P. Morizur and C. Djerassi.
Mass Spectrometry in Structural and Stereochemical Problems CCVIII.
The Effect of Double Bonds Upon the McLafferty Rearrangement of
Carbonyl Compounds. J. Amer. Chem. Soc., 94, 473 (1972). By
J.R. Dias, Y.M. Sheikh and C. Djerassi.
Mass Spectrometry in Structural and Stereochemical Problems CCXV.
Behavior of Phenyl-Substituted a,B-Unsaturated Ketones Upon Electron
Impact. Promotion of Hydrogen Rearrangement Processes. J. Org.
Chem., 37, 776 (1972). By R.J. Liedtke, A.F. Gerrard, J. Diekman and
C. Djerassi.
Mass Spectrometry in Structural and Stereochemical Problems CCXXII.
Delineation of Competing Fragmentation Pathways of Complex Molecules
from a Study of Metastable Ion Transitions of Deuterated Derivatives.
Org. Mass Spectrom., 7, 367 (1973). By D.H. Smith, A.M. Duffield and
C. Djerassi.
The Carbon-13 Magnetic Resonance Spectra of Acyclic Aliphatic Amines.
J. Amer. Chem. Soc., 95, 3710 (1973). By H. Eggert and C. Djerassi.
The Carbon-13 Nuclear Magnetic Resonance Spectra of Keto Steroids.
J. Org. Chem., 38, 3788 (1973). By H. Eggert and C. Djerassi.
Mass Spectrometry in Structural and Stereochemical Problems CCXXXVIII.
The Effect of Heteroatoms Upon the Mass Spectrometric Fragmentation
of Cyclohexanones. J. Org. Chem., in press. By J.H. Block, D.H. Smith,
and C. Djerassi.
Mass Spectrometry in Structural and Stereochemical Problems CCXLII.
Applications of DADI, a Technique for Study of Metastable Ions, to
Mixture Analysis. J. Amer. Chem. Soc., submitted for publication.
By D.H. Smith, C. Djerassi, K.H. Maurer, and U. Rapp.
**********************************************************************
Our past work in the area of mass spectrometric instrumentation has
led to detailed knowledge of the performance capabilities of the mass
spectrometer, implementation of some elements of computer control of
the instrument and development of sophisticated programs to evaluate
the performance of the spectrometer and to acquire and reduce mass
spectra. The instrument has received heavy use, with greatest emphasis
being placed on high resolution mass spectral data* and evaluation of
the GC/MS system. We wish to upgrade these capabilities to provide
a routine GC/HRMS system.
Our past work on applications of artificial intelligence to the
interpretation of mass spectra has given us a firm foundation on which
to base broader explorations of molecular structure elucidation. We
intend to integrate state of the art spectroscopic data collection,
especially GC/HRMS, with artificial intelligence techniques.
We wish also to explore additional techniques that would complement
these in solving structure determination problems.
Our recent work on finding mass spectrometry interpretation rules
(theory formation) can provide additional unique capabilities for
assisting with the problem solving. We wish to continue this
research because it offers hope for a solution to the problem of
furnishing real-world knowledge to computer programs -- in particular
to the computer programs that assist with structure elucidation.
This is a pressing problem in current AI research. High performance
programs, of which DENDRAL is most often cited, derive their power
from large stores of knowledge. Yet there are no routine methods for
infusing such systems with knowledge of the task domain. We believe
our research in theory formation holds a key to the solution of this
problem.
We believe that much of our previous work can be immediately useful
to scientists elsewhere. We have frequently provided assistance to
collaborators in the past, often uncovering interesting research
questions in the process. We hope to make the instrumentation and
computer programs broadly available on a routine basis. As a first
step, we wish to make available the most useful aspects of our
current system to the community of scientists using the NIH-sponsored
SUMEX computer facility. (See Section IC for a brief discussion of
SUMEX.)