perm filename CONS4.RNO[NEW,AIL] blob sn#408188
filedate 1979-01-08 generic text, type T, neo UTF8
.lm 10 .rm 70
Locations by Constraints
Computer Science Department
University of Rochester
Rochester, New York 14620
.lm 20 .rm 60
Relationships between objects in the real world are
constrained by many physical and functional considerations.
This paper presents a
formalism which allows such constraints to be represented and
used to make inferences about object locations in images.
The formalism is used in a system which accepts information about geometric
constraints between structures in images and then uses these constraints
to guide search for these structures.
The system has been used successfully to infer rib positions in a
chest X-Ray, and locate aeration tanks and new construction sites in
.lm 10 .rm 70
.title Modeling by Constraints
Consider the following situation: You are flying at 3000 feet somewhere close to home, and
you wish to find your neighbor's house; you might find yourself examining the
scene below in the following fashion: "Well, I know I live close to
the river and upstream from the end of my street is a park, so if I find the park I can
find my house. And since my neighbor lives north of my house I can just look a little
north of my house..." and so on.
We can view this impromptu strategy for neighbor's-house-finding as
applying a successive
series of constraints to the aerial image you are viewing,
thus removing areas of the scene from consideration. "My house is close to
a river and down from the park" and "My neighbor's house is close to mine" are both
facts which we can use to limit the
search space for a feature to a
The first constrains the sub-image to be scrutinized to that part of the total
image which is both close to the river and at a particular orientation
to the river and park, while the second implies a
which is close to your house.
This process of succesive reduction of the search space by repeated
limitation of the space would be extremely useful in approaching
a computer vision task.
Currently, many systems simply use the technology approach
to vision; i.e. apply an operator over the whole of an image, and
then use the results from
this massive "search" for further analysis. If instead we could limit -
or, as we will say, constrain -
our focus of attention
to a much smaller area by the intelligent use of some facts
from them, we can then make considerable savings in an analysis of any scene.
Our goal then, is to maximize the use of facts we already know
about a given
scene to tell us where to look for objects we are interested in.
Further, we would also like to be able to use any information that
can be infered during the analysis as soon as
the inference can be made.
To attain these goals, we introduce a formalism which we have
found useful in limiting
our search of an image for a particular instance of an object.
↑&II. Constraint Networks\&
Constraint networks were originally outlined in [Ballard, et al] .
In this paper, I have extended and refined that notion. Here, constraint
networks are essentially
networks which model the expected location of a
generic instance of a particular feature embedded in an image.
A constraint network utilizes some beginning facts about a scene, and based on
its model of the physical and geometric constraints likely to be
imposed on the feature being searched
for, infers a region in the image where it would expect to find it.
That is, a constraint network (CN) models a real world object's expected
location in the image by
describing its relationships to other objects already known in the image.
Each of these descriptions is actually a ↑&constraint\&
on the object's location in the image. A dockyard, for instance, is usually
found adjacent to the water's edge and in or near a harbor.
This statement tells us two characteristics of real
dockyards; 1) Dockyards are adjacent to the coastline; and 2) Dockyards are in
or near harbors.
Both statements constrain the dockyard's possible location
by specifying where it would be with respect to other features
in the scene.
This is analagous to the approach used by constraint nets to
locate features in an image.
A CN is composed of operations which express these relationships
between objects. Each relationship is encoded as a node in the CN and
serves to determine the object location more specifically within the
However, when modeling real world constraints,
we are not limited simply to relating features within an image.
We can also utilize additional information about the domain we are working in.
also know that docks have a normalized albedo at some determined value
for ultraviolet aerial
Knowing this fact would immediately
reduce our searching for docks to those areas of the scene which
have a reflectivity greater than the value specified.
In this case,
by adding the capability of image domain recognition,
we have extended the description of
a feature to be both a model of the real world characteristics of the feature ↑&and\&
a model of the image-specific representation of the feature.
CN's utilize both geometric and physical types of knowledge about a feature,
and, by doing so, represent both a model of the real world and
of the image representation of the feature.
A CN is a data structure which we use to represent the constraints
known about an object's location in an image in relation to the other
features in the scene.
Normally, a CN serves as a data source giving the feature's location.
However, since the relationships are explicitly encoded by the nodes of the CN,
if the location is not known when the CN is interrogated, then
an evaluator can use the CN to compute the feature's location.
CN's offer an inexpensive way to preprocess an image
in order to reduce computing time spent in expensive special purpose
Constraint networks model the generic version of a feature; or
at most, if a feature has more than one common form, only the most common forms.
The constraints embedded in the CN (currently) model only form and not function.
This is a limitation on the robustness of a CN model, and can lead to a false positive
indication of a feature instantiation.
CN's, though, are only a portion of a
larger vision understanding
system (in this case, the Rochester Vision Understanding System;
see [Lantz]), and
serve to reduce the search space for a particular feature
and provide a high level model of the form of a feature.
A CN has the ability to maintain partial results on all of
its computations and on processing done elsewhere in the image-understanding
system. This ability follows from the fashion in which constraint networks are evaluated
results, as will be shown below.
Once a result or a partial result is computed in the course of an evaluation, the information
is kept around to be of use in a further computation if additional information is
obtained or if a different CN wants the same information. This facility
results in a tremendous savings not only in processing time if the same CN is recomputed,
but since the semantics of the result are kept around (by default, since it's attached
to the node which encodes it) we can also use this partial result elsewhere
either within this same CN or by any other CN which would
also like to know this particular partial result to reduce ↑&its\&
time spent in computation. [Burstall, et al]
.test page 6
↑&III. Constraint Types\&
Constraints in the CN world are geometric operations on
data. These operations are either primitive operations on
the features of the image (such as union, reflection or translation),
or a CN network of primitive operations.
Thus, an entire CN can be thought of as a single constraint on the image giving
the specified object's expected location in the scene.
a CN modeling the location of the heart in a chest X-ray is made up
of operations on the primitive feature elements in the image.
This CN can be thought of as a single constraint for "location of heart".
This CN though, is composed of further constraints.
The rib locations may in turn be specified by constraints on their
locations with respect to other landmarks available.
However, at the lowest level of detail, constraints are formed from
the primitive geometric constraints available to the CN builder.
The primitive geometric constraints are
simply a set of basic operations such as reflection, translation, scale,
and shape specifiers.
Since CN's are describing areas for search, it is also useful to have
operations such as close-at-a-distance, area union and area intersection
The function of the operations in the
primitive set is to provide the CN builder
with enough tools to easily describe an area in relationship
Although the number of potential operations is quite large,
we have found that a small number
of primitives (about a dozen) suffice for most
of our descriptive tasks.
In our system, the primitive set is made up of four different types
↑&Directional\& operations specify where to focus attention.
Operations such as LEFT, REFLECT, NORTH, UP and DOWN all
constrain the sub-image to be in a particular orientation to another
↑&Area\& descriptions specify a particular area in the scene that
restricts a feature location. For example, CLOSE-TO, IN-QUADRILATERAL,
and IN-CIRCLE define areas at some location in an image of interest.
↑&Set\& operations permit areas to be handled as point sets of pixels.
These operations, such as UNION, DIFFERENCE and INTERSECTION make
very complex image areas far easier to describe.
↑&Predicates\& on areas allow features to be filtered out of consideration
by measuring some characteristic of the data. For example,
a predicate testing WIDTH, LENGTH or AREA against some
value would restrict the size of features in consideration
to be only those within a permissible range.
.test page 6
↑&IV. Describing an Object Location by Constraints\&
By using a combination of geometric constraints we can
describe the expected location of most features that we would
like to find in a scene. We can use the location of the
aeration tank in a sewage treatment plant for an example.
The aeration tank is a rectangular tank surrounded
on either end by circular sludge and sedimentation tanks.
As a general rule, sewage flows from the sedimentation tanks,
to aeration tanks and finally through
to the sludge tanks. This design permits us to extract and
use the following types of constraints on the location of the aeration tanks.
.lm 20 .rm 60
↑&Constraint 1:\& "Aeration tanks are located
somewhere close to both the sludge tanks and the sedimentation tanks."
↑&Constraint 2:\& "Aeration tanks must not be too close to either the
sludge or sedimentation tanks."
.lm 10 .rm 70
These constraints seem reasonable because we
understand that in any obvious design
of a sewage plant, the various components of the
treatment process will
be close together and since the aeration phase occurs between sludge treatment and
sedimentation collection, we can expect to find the aeration tanks
between the sludge and sedimentation tanks.
Aeration tanks will rarely be found distant from the major processing
portion of the plant if for no other reason than that it would be expensive
(as well as unproductive) to locate them so far away.
We would encode the two constraints given above in three separate,
but similar ways in our method given here. The first constraint
explicitly relates the
possible location of the aeration tank with the location of the
sludge tanks. This relationship would be encoded in a CN as:
When this CN is evaluated, the top node would have data expressing that
part of the image which satisfies Constraint 1;
that is, that portion of the image which is within some distance X
of both the sludge tanks and the sedimentation tanks.
(X could be found empirically. It could also be the result
of another CN which might consider the tank's diameter or
the camera angle of the image.)
Similarly, Constraint 2 would be encoded as shown in Figure 2.1 below.
The entire network describing the probable location of the aeration tanks
must satisfy both of these constraints. Constraint 1 determines
an area which is close to both groupings of tanks and Constraint 2
says that we should disregard a portion of that area.
If we think of the image as a point set (that is, a set of pixels in some
order which makes up the image), the function to perform would be
a differencing operation to remove the area given by the second constraint
from that specified by Constraint 1. Figure 2.2 shows the final CN which
incorporates both constraints.
Of course, there could be places where the aeration tanks might be
located very far away, might
be round or perhaps violate some other constraint.
It is important to note that the constraints are only the most likely limitations on an object.
They can work very well for stereotyped scenes, and might fail to perform on novel situations.
The cause or ontology of the constraint
is unimportant here. (See Section IX on Future Work.)
Simply that such constraints exist and can be utilized by an image understanding
system is important in this context.
We believe that geometric constraints are adequate
for the kind of features we may wish to locate in an image
because things of complexity usually exhibit differentiable parts related and
interconnected by their functions or by their existence in a common milieu.
Because of this, these relationships often can be expressed in terms of their
form, or put another way, by geometric descriptions which relate parts
of the whole.
This can occur, for example,
when attempting to analyze microphotographs of a slide smear. The
objects have purposely randomized locations on the slide
to make them easily observable.
As we have noted, however, the connections of functionality, shared properties
and cooperation affect most commonplace things, and therefore, constraining
search is often a useful and powerful tool to guide search.
↑&V. Constraint Networks: Structure and Function\&
As we have established, a CN describes the constraints necessary to locate an object in
a particular image.
This knowledge is embedded in the form of a
connected network of nodes of three types:
↑&Feature nodes\& -
are the handle
by which the constraint network is accessed by some larger image understanding system.
A feature node is the embodiment of the knowledge which describes a particular
feature in an image.
A Feature Node has attached to it (as sons) the various strategies
that encode the possible locations of the feature in the
image. These strategies are CN's themselves.
Yet, a CN is not completely a procedural mechanism for representing
knowledge, for associated with each node in a CN
is the result of the evaluation of the CN below that node.
This also holds true with the feature node;
if the entire CN has been evaluated below the feature
node, then the feature node will contain
the result of that evaluation and does not need to compute it.
When this occurs, the CN (which normally must be evaluated in order to produce a
result) will become a simple lookup.
In other words, a CN is a "compute on demand" structure which will minimize the amount
of processing that it must perform.
This is very similar to the functioning of "memo functions" as suggested by
↑&Operation nodes\& -
are the nodes encoding the various constraints which are placed on the item being evaluated.
An operation node gets input from all of its sons and then applies the operation it
represents on that data.
Operation nodes represent the geometric relationships between features and
are operations are choosen from the primitive set.
↑&Data nodes\& - are the terminal nodes of the network. That is, they have no sons and always
evaluate to data. When a network is initialized, and no partial results
are available, data nodes supply the network with initial data in the
image to operate on.
The nodes of a CN can all potentially hold data.
This capability i