High level features carry information about an image in an abstracted or propositional form, . The use of high level features is appropriate in model based computer vision, as illustrated in figure 1, where the matching is applied to attributed relational graphs which are widely used as a flexible high level feature.
This paper describes a method of extracting such graphs from images of 3-D objects.
There are few accounts of practical methods for extracting any sort of high level feature from images and this is an important gap in the computer vision and image processing literature.
Examining figure 1, real world data is processed so as to create an attributed relational graph describing the image of an object. The reference representation was computed off-line from a geometric or ray-tracing model. This predicts the appearance of a 2-D projection of the real world as a graph with the same conventions as that created from the real world data.
A graph matching procedure is used to associate fragments of the two graphs and obtain a cost. A control strategy selects candidate classes and presents fragments of reference graphs for matching. It also monitors system performance and allocates system resources to more promising candidates, . This paper is limited to describing the extraction of the relational graphs.
The following describes the method that is in use for constructing relational graphs from colour and range image data. The relational graphs are built from nodes, representing the contours of regions, connected by arcs representing the relative positions of the contours in the image's 2-D coordinate system. This basic graph structure has attributes attached to it creating a flexible and extensible propositional representation.
The input image(s) are processed to identify and label connected regions. The standard definition of a connected region: '...set of pixels that are connected under some definition and share some common property...' is employed. 4-connectedness is required, and is taken as an indication of adequate sampling, .
The examples presented here employ coloured images registered (to within 1 pixel) with range images, forming a 4 component vector valued pixel image.
In the present implementation, the colour image has the dominant role in defining regions. The 'common property' is a uniformity of colour ratio, r:g:b, while the total intensity remains above a given threshold. Regions are extracted by subjecting the 3-component colour images to systematic averaging in small patches to reduce variance without crossing the 'walls' of high variance between regions.
Figure 2. shows a colour image of an object.
Figure 3. shows a depth image that is registered with the image in figure 2.
The images have been segmented and figure 4 shows the regions identified in the images which are used as masks to control the extraction of data to complete the graph.
The next stage is the tracing of the contours of the regions such as those of figure 4.
The output is a list of coordinates of the 8-connected contour pixels. The contour trace defines pixels in the image which are used to calculate a number of numerical and logical attributes for each contour.
The attributes extracted in the present implementation are:
1. Centroid, cg(x,y);
2. Area, aa(area);
3. Moments, mm(m1,m2,m3,m4);
4. Mean colour values, cr(r,g,b);
5. Plane equation coefficients, pl(a,b,c,d);
6. Shape classification,
7. External/internal contour flag.
The image is a projection onto 2-D of the original 3-D object via the optics of the camera and the quantisations implicit in the image capture device. The attributes are calculated using the image coordinate system. The known problems of camera imperfections, , are not dealt with in the present implementation.
The centroid of the contour is calculated. This is expressed by the proposition cg(xbar,ybar). The centroid is used as the 'handle' by which to describe the position of the contour and region.
The area enclosed by each external contour, excluding areas enclosed by any internal contours, is computed. The area is the pixel count in the image. The proposition is aa(area).
The phenomenon of rotation and scale invariant combinations of body centred moments is well known. The first 4 such combinations are calculated from the coordinates of the contour pixels and are expressed as the proposition mm(m1,m2,m3,m4).
The values are scaled and truncated to become large integers for ease of transmission and processing. The use of the rotation and scale invariant combinations rather that the raw moments does not discard size and orientation information which is available from the area and centroid attributes. It is convenient to use the moment attribute purely to represent shape.
The mean colour values found under the track of the contour trace are computed yielding values for red, green and blue. The regions were defined based on a uniformity of colour ratio. The proposition is cr(r,g,b).
If the contour belongs to a region that is approximately planar on the 3-D object, then the equation of the most closely fitting plane, in the form ax+by+cz+d=0 is useful and can be computed from the range image. The calculation chooses (a,b,c,d) to minimise the sum of absolute distances between the plane and the points on the contour. If no acceptable fit is possible the contour is assumed to be non-planar and no attribute is given otherwise the propositional form is pl(a,b,c,d).
The a,b,c,d values are based on a coordinate system after perspective projection because of the viewing conditions. Work must be done to recover a plane equation valid in the real world coordinate system.
The 4 rotation and scale invariant body centred moment combinations are used to form a minimum distance classification of the contour against the three references of a circle, square and triangle. This crude classification can save time in further feature processing. The propositions are sqr or cir or tri.
The discovery of contours in the image allows a determination of whether they are internal or external. This property is indicated with the proposition ext or int.
The arcs of the relational graph are labelled with codes that show the geometrical relation between the contours.
The existence of an arc between two nodes implies that they are within a certain threshold distance (between centroids) of each other. Without this filtering, the graph would be totally connected.
In the present implementation, the label on each arc is one of 16 direction codes. These correspond to compass headings North, North by North East, North East, North East by East etc..
An arc can also carry one of the labels 'outside' or 'inside'. These, if present, show that there is a significant intrusion of one contour into the bounding rectangle of the other.
Attributes on the arcs indicate the importance of including the arcs in the matching. They are used for scheduling matching attempts, , as part of a technique to partition the matching and monitor its progress, .
The block with a cylinder mounted on it, figures 2 and 3, will continue to be used as a running example.
The creation of a relational graph as a reference can begin either with a geometric model such as a CAD/CAM model, or can employ ray tracing, .
The use of a CAD/CAM model requires issues of surface visibility and colour to be handled explicitly.
In the examples here, a z-buffer algorithm is used to determine visibility and colour values are appended after all geometric processing has been completed.
When ray tracing is employed, these issues are handled implicitly after the 'studio definition' has been given.
Whatever the synthesis, the resulting image is processed by the same
feature extraction software that is applied to real image data. The
sequence of steps in creating the attributed relational graph is
(1). capture or synthesise an image, including a depth image;
(2), Grow regions in the colour image and extract region masks,
(3), Construct the relational graph using contours in the region mask to provide the graph's nodes.
Figure 5, below, gives pseudo code for growing regions with uniform colour ratios and intensities above a threshold.
___________________________________________________________________ Input: 3 component image, R:G:B ratio variance tolerance, Tol. Intensity threshold Thr Iteration count Itcnt Neighbourhood size 2*n+1 **2 n Output: Regions labelled on a pixel grid. For each pixel position Begin If the total brightness exceeds Thr Then Begin Rescale the pixels in the 3 colour planes so that the brightest is Imax; End Else Zeroise all 3 pixels at this position; End; For i:= 1 To Itcnt Do Begin For each pixel position in the image that can serve as a patch centre Begin If variance over patch size 2n+3 **2 <= Tol * variance over patch size 2n+1 **2 Then Begin Average pixels in each colour plane over the patch of size 2n+1 ** 2; End; End End For each pixel position in the image that can be the centre of a patch of size 2n+1 **2, compute a new image pixel whose value is the sum of the variances of the red, green and blue image patches. Threshold the variance image with Thr, invert it and use the result as a visibility mask for the colour planes; Label the 4-connected regions with unique identifiers. Output the labelled image. Figure 5. Pseudocode for growing regions. _________________________________________________________________
Figure 6, below, gives pseudocode for building a relational graph from the image data and the colour segmentation.
______________________________________________________________________________ Input: 3 colour planes, Region mask, depth image all registered. Output: Relational graph descibing the image. Scan the region mask and allocate unique labels to 4-connected regions. Open a tabular data structure with the unique region labels as the key For each region in the region mask, Begin Trace the contour, compute the following and enter the result in the table: - the
image coordinates of the centroid; - the mean colour values around the track of the contour; - the first four invariant combinations of body centred moments; - the area enclosed by the contour; - the equation of the best fitting plane Ax+By+Cz+D=0 to the range image data masked by the current region; End; For each record, R1, in the table Begin For each other record, R2, in the table Begin If the distance between the centroids is less than Thr Then Begin Compute the direction code from the centroid in R1 to the centroid in R2; Examine the values in the table for priority cues; Output the relation between R1 and R2 in the format of figure xx; End; End; End; [Notes: The parameter Thr controls the extent to which the graph is connected, Thr >= root(2) * SIZE guarantees that the graph is fully connected, Thr = 0 will suppress all arcs. The algoritm above creates a relation from node p to node q and an equal relation from node q to node p. An asymmetry may be exploited in later work.] Figure 6. Pseudocode to create a relational graph from region data. _____________________________________________________________________________
Figure 7 shows a fragment of the machine readable relational graph obtained from the example object. Only one relation is shown and the values of the attributes can be seen. The nodes are related by a code number, 5 in this example, that indicates the relative positions of the centroids of the node contours.
The relation has a 'strength' parameter, @rep=1 in this case. This value will be used in future to indicate the reliability of the estimate of the relation. It has been used previously to represent the length of a line segment when a line drawing is being represented in this graph format.
__________________________________________________________________________ Graph components Notes [A[@id=crot3.77;ext:sqr:aa(1659):mm(19815,148,0,0): <- Leading node cg(62,86):cr(255,153,153):pl(-204,574,792,10353)]] with attributes [[@rep= 1 ]] <- Relation and strength [B[@id=crot3.77;ext:sqr:aa(199):mm(17759,244,1,0): <- Trailing node cg(98,77):cr(153,153,255):pl(966,2,258,-79198)]]$ with attributes Figure 7. Fragment of relational graph showing attributes. __________________________________________________________________________
One of the main motives for studying structural features is to automate the building of a control strategy, shown in figure 1, for the matching, , .
Structural features seem to provide a robust and concise description of objects.
The use of the same feature extraction methods on real and synthetic images requires the image synthesis to be realistic.
The use of structural features in matching seems to be extremely flexible from the point of view of software development and the ability to merge data from the multi-sensor sources, illustrated here by colour and range images.
 F. van der Heijden, Image Based Measurement Systems, Wiley,
 R. E. Blake, Development of an Incremental Graph Matching Device, Pattern Recognition Theory and Applications, NATO ASI Series, Vol 30, pp355-366, Springer, Berlin, 1987.
 T. Pavlidis, Algorithms for Graphics and Image Processing, Computer Science Press, 1982.
 N. Thune, Stereo Vision: An Integrated Approach, dr.ing. thesis, University of Trondheim, 1991.
 R. E. Blake, Partitioning Graph Matching with Constraints, Pattern Recognition, Vol 27, No.3, pp 439-446, March 1994.
 J. Foley, A. van Dam, S. Feiner and J. Hughes, Computer Graphics: Principles and Practice, Second Edition, Addison-Wesley, Reading, MA., 1990.
 R. E. Blake and P. Boros, The extraction of structural features for use in computer vision. Proceedings of the Second Asian Conference on Computer Vision, Singapore, December 1995.
 P. Boros and R. E. Blake, The calculation of the aspect graph from a CAD model. Proceedings of the Second Asian Conference on Computer Vision, Singapore, December 1995.