perm filename CONS.RNO[NEW,AIL] blob sn#408187 filedate 1979-01-08 generic text, type T, neo UTF8
.figure 13
.lm 10 .rm 70
.center 80
Modeling Object
.center 80
Locations by Constraints
.s 2
.nf .nj
.center 80
Daniel Russell
.center 80
Computer Science Department
.center 80
University of Rochester
.center 80
Rochester, New York 14620
.center 80
.lm 20 .rm 60
.f .j
Relationships between  objects in the real world are 
constrained by many physical and functional considerations.
This paper presents a 
formalism which allows such constraints to be represented and 
used to make inferences about object locations in images.
The formalism is used in a system which  accepts information about geometric
constraints between structures in images and then uses these constraints
to guide search for these structures.
The system has been used successfully to infer rib positions in a 
chest X-Ray, and locate aeration tanks and new construction sites in 
aerial images.
.lm 10 .rm 70
.title  Modeling by Constraints
.f .j
.subtitle  **DRAFT**
.lm 10
.number 1
↑&I. Introduction\&
Consider the following situation: You are flying at 3000 feet somewhere close to home, and
 you wish to find your neighbor's house;  you might find yourself examining the 
scene below  in  the following fashion: "Well, I know I live close to 
the river and upstream from the end of my street is  a park, so if I find the park I can 
find my house.  And since my neighbor lives north of my house I can just look a little
north of my house..." and so on. 
We can view this impromptu strategy for neighbor's-house-finding as
applying a successive
series of constraints to the aerial image you are viewing,
thus removing areas of the scene from consideration.  "My house is close to 
a river and down from the park" and "My neighbor's house is close to mine" are both 
facts  which we can use to limit the
search space for a feature to a 
reasonable size. 
The first constrains the sub-image to be scrutinized to that part of the total
image which is both close to the river and at a particular orientation
to the river and park, while the second implies a 
constrained sub-image 
which is  close to your house.
This process of succesive reduction of the search space by repeated 
limitation of the space would be extremely useful in approaching
a computer vision task.
Currently, many systems simply use the technology approach
to vision; i.e. apply an operator over the whole of an image, and
then use the results from 
this massive "search" for further analysis.  If instead we could limit -
or, as we will say, constrain -
our focus of attention
 to a much smaller area by the intelligent use of some  facts
 and inferences 
from them, we can then make considerable savings in an analysis of any scene.
Our goal then, is to maximize the use of  facts we already know 
about a given 
scene to tell us where to look for objects we are interested in.
Further, we would also like to be able to use any information that
can be infered during the analysis as soon as 
the inference can be made.
To attain these goals, we introduce a formalism which we have 
found useful in limiting 
our search of an image for a particular instance of an object.  
↑&II. Constraint Networks\&
Constraint networks were originally outlined in [Ballard, et al] . 
In this paper, I have extended and refined that notion.  Here, constraint 
networks   are essentially 
networks which model the expected location of a
generic instance of a particular feature embedded in an image.
A constraint network utilizes some beginning facts about a scene, and based on 
its model of the physical and geometric constraints likely to be
imposed on the feature being searched 
for, infers a region in the image where it would expect to find it.
That is, a constraint network (CN) models a  real world object's expected
location in the image by 
describing its relationships to other objects already known in the image.
Each of these descriptions is actually a ↑&constraint\& 
on the object's location in the image.  A dockyard, for instance, is usually
found adjacent to the water's edge and in or near a harbor. 
This statement tells us  two characteristics of real 
dockyards; 1) Dockyards are  adjacent to the coastline; and 2) Dockyards are  in 
or near harbors.
Both statements constrain the dockyard's possible location 
by specifying where it would be with respect to other features
in the scene.  
This is analagous to the approach used by constraint nets to 
locate features in an image.  
A CN is composed of operations which express these relationships
between objects.  Each relationship is encoded as a node in the CN and 
serves to determine the object location more specifically within the 
However,  when modeling  real world constraints,
we are not limited  simply to relating features within an image.
We can also utilize additional information about the domain we are working in.
We might 
also know that docks have a normalized albedo at some determined value
for ultraviolet aerial
Knowing this fact would immediately      
reduce our searching for docks to those areas of the scene which
have a reflectivity greater than the value specified.
In this case,
by adding the capability of image domain recognition,
we have extended the description of 
a feature to be both a model of the real world characteristics of the feature ↑&and\&
a model of the image-specific representation of the feature.  
CN's utilize both geometric and physical types of knowledge about a feature,
and, by doing so, represent both a model of the real world and  
of the image representation of the feature.
A CN is a data structure which we use to represent the constraints
known about an object's location in an image in relation to the other
features in the scene.
Normally, a CN serves as a data source giving the feature's location. 
However, since the relationships are explicitly encoded by the nodes of the CN,
if the location is not known when the CN is interrogated,  then
an  evaluator can use the CN to compute the feature's location.
CN's offer an inexpensive way to preprocess an image
in order to reduce computing time spent in expensive special purpose 

Constraint networks model the generic version of a feature; or 
at most, if a feature has more than one common form, only the most common forms.  
The constraints embedded in the CN (currently) model only form and not function.
This is a limitation on the robustness of a CN model, and can lead to a false positive
indication of a feature instantiation.  
CN's, though, are only a portion of a
larger vision understanding 
system (in this case, the Rochester Vision Understanding System;
see  [Lantz]), and
serve to reduce the search space for a particular feature
and provide a high level model of the form of a feature.
A CN has the ability to maintain partial results on all of 
its computations and on processing done elsewhere in the image-understanding 
system. This ability follows from  the fashion in which constraint networks are evaluated
to obtain
results, as will be shown below.
Once a result or a partial result is computed  in the course of an evaluation, the information 
is kept around to  be of use in a  further computation if additional information is 
obtained or if a    different CN wants the same information.  This facility
results in a tremendous savings  not only in processing time if the same CN is recomputed,
but since the semantics of the result are kept around (by default, since it's attached 
to the node which encodes it) we can also use this partial result elsewhere
either within this same CN or by any other CN which would
also like to know this particular partial result to reduce ↑&its\&
time spent in computation.  [Burstall, et al]
.test page 6
↑&III. Constraint Types\&
Constraints in  the CN world are geometric operations on 
data.  These operations are either primitive operations on 
the features of the image (such as union, reflection or translation),
or a CN network of primitive operations.      
Thus, an entire CN can be thought of as a single constraint on the image giving
the specified object's expected location in the scene.
For example,
a CN modeling the location of   the heart in a chest X-ray is made up
of operations on the  primitive feature elements in the image.
This CN can be  thought of as a single constraint for "location of heart".
This CN though, is composed of further constraints.  
The rib locations may in turn be specified by constraints on their 
locations with respect to other landmarks available.
However, at the lowest level of detail, constraints are formed from 
the primitive geometric constraints available to the CN builder.
The primitive geometric constraints are 
simply a set of basic operations such as reflection, translation, scale, 
and shape specifiers.
Since CN's are describing areas for search, it is also useful to have
operations such as close-at-a-distance, area union and area intersection
The function of the operations in the
primitive set is to provide the CN builder 
with enough tools to easily describe an  area in relationship 
to others.  
Although the number of potential operations is quite large,
we have found that a small number
of primitives (about a dozen) suffice for most 
of our descriptive tasks.
In our system, the primitive set is made up of four different types 
of operations.
↑&Directional\& operations specify where to focus attention.
Operations such as LEFT, REFLECT, NORTH, UP and DOWN all 
constrain the sub-image to be in a particular orientation to another  
↑&Area\& descriptions specify a particular area in the scene that 
restricts a feature location.  For example, CLOSE-TO, IN-QUADRILATERAL,
and IN-CIRCLE define areas at some location in an image of interest.
↑&Set\& operations permit areas to be handled as point sets of pixels.
These operations, such as UNION, DIFFERENCE and INTERSECTION make 
very complex image areas far easier to describe.  
↑&Predicates\& on areas allow features to be filtered out of consideration
by measuring some characteristic of the data.  For example, 
a predicate testing WIDTH,  LENGTH or AREA against some 
value would restrict the size of features in consideration
to be only those within a permissible range.
.test page 6
↑&IV. Describing an Object Location by Constraints\&
By using a combination of geometric constraints we can 
describe the expected location of most features  that we would 
like to find in a scene. We can use the location of the
aeration tank in a sewage treatment plant for an example.  
The aeration tank is  a rectangular tank surrounded 
on either end by  circular sludge and sedimentation tanks. 
.figure 29
.s 27
.center 80
↑&Figure 1\&
As a general rule, sewage flows from the sedimentation  tanks,
to aeration tanks and finally through
to the sludge tanks.  This design permits us to extract and 
use the following types of constraints on the location of the aeration tanks.
.lm 20 .rm 60
.indent -5
↑&Constraint 1:\& "Aeration tanks are located 
somewhere close to both  the sludge tanks and the sedimentation tanks."
.indent -5
↑&Constraint 2:\& "Aeration tanks must not be too close to either the 
sludge or sedimentation tanks."
.lm 10 .rm 70
These constraints seem reasonable  because we 
understand that in any obvious design
of a sewage plant,  the various components of the 
treatment process will
be close together and since the aeration phase occurs between sludge treatment and 
sedimentation collection, we can expect to find the aeration tanks 
between the sludge and sedimentation tanks. 
Aeration tanks will rarely be found distant from the major processing 
portion of the plant if for no other reason than that it would be expensive
(as well as unproductive) to locate them so far away. 
We would encode the two constraints given above in three separate, 
but similar ways in our method given here.  The first constraint
explicitly relates the
possible location of the aeration tank with the location of the 
sludge tanks.  This relationship would be encoded in a CN as:
.figure 16
.s 15
.center 80
↑&Figure 2\&
.s 2
When this CN is evaluated, the top node would have data expressing that 
part of the image which satisfies Constraint 1; 
that is, that portion of the image which is within some distance X
of both the sludge tanks and the sedimentation tanks.
(X could be found empirically.  It could also be the result
of another CN which might consider the tank's diameter or 
the camera angle of the image.)
Similarly, Constraint 2 would be encoded as shown in Figure 2.1 below.
.figure 20
.s 18
.center 80
Figure 2.1
The entire network describing the probable location of the aeration tanks
must satisfy both of these constraints.  Constraint 1 determines 
an area which is close to both groupings of tanks and Constraint 2 
says that we should disregard a portion of that area.  
If we think of the image as a point set (that is, a set of pixels in some
order which makes up the image), the function to perform would be
a differencing operation to remove the area given by the second constraint
from that specified by  Constraint 1.  Figure 2.2 shows the final CN which
incorporates both constraints.  
.figure 20
.center 80
Figure 2.2
Of course, there could be places where the aeration tanks might be
located very far away,   might
be round  or perhaps violate some other constraint.
 It is important to note that the constraints are only the most likely limitations on an object.
They can work very well for stereotyped scenes, and might fail to perform on novel situations.
The cause or ontology of the constraint
is unimportant here.  (See Section IX on Future Work.) 
Simply that such constraints exist and can be utilized by an image understanding
system is important in this context.  
We believe that geometric constraints  are adequate
for  the kind of features we may wish to locate in an image
because things of complexity usually exhibit differentiable parts related and 
interconnected by their functions or by their existence in a common milieu. 
Because of this, these relationships often can be expressed in terms of their
form, or put another way, by geometric descriptions which relate parts 
of the whole.
This can occur, for example,
when attempting to analyze microphotographs of a slide  smear.  The 
objects have  purposely randomized locations on the slide 
to make them easily observable.
As we have noted, however, the connections of functionality, shared properties 
and cooperation affect most commonplace things, and therefore, constraining 
search is often a useful and powerful tool to guide search.
↑&V. Constraint Networks: Structure and Function\&
As we have established, a CN describes the constraints necessary to locate an object in
a particular image.
This knowledge is embedded in the form of a 
connected network of nodes of three types:
↑&Feature nodes\& -
are the  handle 
by which the constraint network is accessed by some larger image understanding system.   
A feature node is the embodiment of the knowledge which describes a particular
feature in an image.  
A Feature Node has attached to it (as sons) the various strategies
that encode the possible locations of the feature in the 
image.  These strategies are CN's themselves. 
Yet, a CN is not completely a procedural mechanism for representing 
knowledge, for associated with each node in a CN
 is the result of the evaluation of the CN below that node.
This also holds true  with the feature node; 
if the entire CN has been evaluated below the feature
node, then the feature node will contain 
the result of that evaluation and does not need to compute it.
When this occurs, the  CN  (which normally  must be evaluated in order to produce a 
result) will become a simple lookup.
In other words, a CN is a "compute on demand" structure which will minimize the amount 
of processing that it must perform.
This is very similar to the functioning of "memo functions" as suggested by 
↑&Operation nodes\& -
are the nodes encoding the various constraints which are placed on the item being evaluated.
An operation node gets  input from all of its sons and then applies the operation it 
represents on that data.  
Operation nodes represent the geometric relationships between features and
are operations are choosen from the primitive set.
↑&Data nodes\& - are the terminal nodes of the network.  That is, they have no sons and always 
evaluate   to data.  When a network is initialized, and no partial results 
are available, data nodes supply the network with initial data in the 
image  to operate on.
The nodes of a CN can all potentially hold data.  
This capability is used to store the partial results found during an evaluation of 
the CN. 
As a result, all nodes are always in one of four states.  A  node 
is UP-TO-DATE if the data attached to it is a valid instance of the feature in the image;
it is OUT-OF-DATE if no data is attached to the node (i.e. it is not known if this primitive feature
exists in the image); the node can be NONE-FOUND if it is known that no primitive feature 
of this type exists in the image, or finally the node can contain information which is 
HYPOTHESIZED (there is a result, but it is the result of the evaluation 
of a CN and may not truly exist in the image).
When evaluated, each different status will affect the result returned, and the
way  that data is handled by the node which made the 
evaluating call.  A node that is OUT-OF-DATE returns a value which 
indicates that the answer may be anywhere in the UNIVERSE of the image.
An UP-TO-DATE node explicitly points to the feature in the image.  
A node which is HYPOTHESIZED  contains real data in the image, but
the data may or may not locate the specified feature
since it is the result of the evaluation of 
another CN.  A HYPOTHESIZED result is used as a place holder for the 
most likely location of the feature until the validit