3D Object Recognition Using Multiple Views, Affine Moment
Invariants and Multilayered Perceptron Network
M. K. OSMAN , M. Y. MASHOR, M. R. ARSHAD
Control and Electronic Intelligent System (CELIS) Research Group,
School of Electrical & Electronic Eng., Univ. Sains Malaysia, Eng. Campus,
14300 Nibong Tebal, Seb. Perai Selatan, Pulau Pinang, Malaysia.
Abstract: - This paper addresses a performance analysis of affine moment invariants for 3D object
recognition. Affine moment invariants are commonly used as shape feature for 2D object or pattern
recognition. The current study proved that with some adaptation to multiple views technique, affine moments
are sufficient to model 3D objects. In addition, the simplicity of moments calculation reduces the processing
time for feature extraction, hence increases the system efficiency. In the recognition stage, we proposed to use
multilayered perceptron (MLP) network trained by Levenberg-Marquardt algorithm for matching and
classification. The proposed method has been tested using two groups of object, polyhedral and free-form
objects. The experimental results show that affine moment invariants combined with MLP network attain the
best performance in both recognitions, polyhedral and free-form objects.
Key-Words: - Computer vision, multiple views technique, moment invariants, 3D object recognition, neural
networks
1 Introduction
In computer vision, the process of recognition
typically involves some sort of sensors, a model
database which contains all the information about
the objects representations and a decision making or
categorization. Sensor e.g. charge-coupled device
(CCD) camera is used to gather images and
information from a scene of interest. Then the
digitized image is processed to represent it in the
same way as the models are represented in the
database. Finally, a recognition algorithm is applied
to find the model to which the object best matches.
The method is also known as model-based object
recognition system, the most common system for
shape or object recognition.
Most model based 3D object recognition systems
use information from a single view of an object
[1][2][3]. However, a single view may not contain
sufficient features to recognize it unambiguously. It
is due to the image of a 3D object that depends on
such factor such as the camera viewpoint and the
viewing geometry. A single view-based approach
may not be applicable for 3D object recognition
since only one side of an object can be seen from
any given viewpoint [4]. One solution to this
problem is to use information from several views of
the object. There has been considerable research on
active object recognition system [5][6], where the
camera moved around the object to gather additional
views until there is enough evidence to obtain a
sufficient level of confidence in an object
hypothesis. However, the system requires a
complicated and expensive setup that is difficult to
achieve [6]. To overcome the problems, some works
proposed a method on combining the evidence from
static cameras [7][8].
In this paper, we proposed a system that used
multiple views technique from static cameras.
Generally, objects can be recognized by their shape,
visual cues such as color, texture, characteristic
motion, their location relatives to other objects in the
scene, context information and expectation. Our
work will focus on the recognition of the isolated
objects using shape information. The proposed
system was not limited to only polyhedral objects
but also considered objects with free-form shape.
Moment functions of the two-dimensional image
intensity distribution or 2D moments have been
widely used over the years in a variety of
applications as descriptors of shape. Hu [9] first
introduced moment invariants based on methods of
algebraic invariants. Later, various methods of
computing 2D moment invariants have been
proposed [10][11]. The Hu moments are invariant
under changes in translation, rotation and scale but
not under general 2D affine transformation. Affine
moment invariants were introduced in [12] to
address the problem.
Most of the works that have been carried out to
date using 2D moments are concerned with the
identification and recognition of 2D objects
[13][14][15]. For 3D object recognition, 3D
moments [16] are used for shape representation.
However, the computational complexity of 3D
moments will increase the processing time, hence
reduce the system efficiency.
In this paper, we used affine moment invariants
[12] for 3D shape representation. Although affine
moment invariants are commonly applied to 2D
object, an adaptation with multiple views technique
enables this technique to be used for 3D object
recognition. The simplicity of 2D moment
calculation will also reduce the processing time and
suitable for real time computer vision system.
Recently, most researchers are focusing on
applying neural network for 3D object recognition.
Compared to the conventional 3D object
recognition, neural networks provide a more general
and parallel implementation paradigm [17]. In this
work, we proposed to use a neural network called
multilayered perceptron (MLP) network to perform
recognition task.
is used to build the 3D object model in the
recognition stage.
Fig. 1: Image acquisition set-up
2 Image Acquisition
In this section, a proposed methodology for
camera-object setup will be discussed. Each object
to be recognized is placed in its stable condition at
the centre of the turntable. The turntable is a circular
horizontal platform that can be rotated 360 degree.
A, B and C represent the coordinate for the cameras
around the turntable. Point A and B are located on
the same horizontal plane, but be vary 900 from each
other. Point C is perpendicular to the turntable. Fig.
1 shows the location of the points and object. Since
all points have the same distance from the centre of
the turntable, all cameras must have the same focal
lengths. The cameras at point A and B are proposed
to be fixed at 450 from perpendicular view and
camera C is at the top of the object. Fig. 2 shows
how the cameras are positioned.
After an object of interest is placed at the centre
of the turntable, the object’s images are acquired.
Then, the object will be rotated 50 and the same
process is repeated. Each rotation will rotate the
object for an angle of 50, and so on until 3600 is
completed. Hence, for each object, we will have 72
image sets. These images are divided into two
groups, 36 image sets for training data and 36 image
sets for testing data. For training data, we considered
images with 00, 100, 200,…3500 condition, and the
rest of the images (image with 50, 150, 250…3550
condition) are used for testing. The training data set
Fig. 2: Camera position for point A,
B and C
3 Image Processing and Feature
Extraction
Captured images are then digitized and sent to the
image processing and feature extraction stage. In the
image processing stage, images are thresholded
automatically using iterative thresholding method
[18]. This method leads to a good separation
between object and background in several
applications [19].
In feature extraction stage, we choose Affine
moment invariants as features for 3D object
modeling. The six affine moment invariants used are
defined below:
I1 =
I2 =
1
µ
1
µ
10
00
4
00
( µ 20 µ 02 − µ112 )
(1)
( µ 302 µ 032 − 6 µ 30 µ 21 µ12 µ 03 + 4 µ 30 µ123
3
2
+ 4µ 03 µ 21
− 3µ 21
µ122 )
(2)
I3 =
I4 =
1
( µ 20 ( µ 21 µ 03 − µ122 ) − µ11 ( µ 30 µ 03 −
µ
µ 21 µ12 ) + µ 02 ( µ 30 µ12 − µ 212 ))
7
00
1
(3)
3
( µ 20
µ 032 − 6µ 202 µ11 µ12 µ 03 −
µ
2
6µ µ 21 µ 02 µ 03 + 9 µ 20
µ 02 µ122 + 12µ 20 µ112 µ 03 µ 21
+ 6 µ 20 µ11 µ 02 µ 30 µ 03 − 18µ 20 µ11 µ 02 µ 21 µ12 −
2
8µ113 µ 03 µ 30 − 6 µ 20 µ 022 µ 30 µ12 + 9 µ 20 µ 022 µ 21
2
+ 12 µ112 µ 02 µ 30 µ12 − 6 µ112 µ 02
µ 30 µ 21 + µ 023 µ 302 ) (4)
11
00
In other word, each output of the network represents
one type of object. Inputs to the networks were
assigned as in Table 1 and Table 2 respectively. In
the recognition step, any output node that has the
largest value is determined as 1. Otherwise, the node
is considered as 0.
2
20
I5 =
1
µ
04
2
( µ 40
− 4 µ 31 µ13 + 3µ 22
)
6
00
1
( µ 40 µ 04 µ 22 + 2 µ 31 µ 22 µ13
µ 009
3
)
− µ 40 µ132 − µ 04 µ 312 − µ 22
where µ pq is defined by:
I6 =
(5)
µ pq = ∫∫
object
f ( x, y )( x − xt ) p ( y − yt ) q dxdy
f2
input layer
(7)
4 Recognition and Classification
In the recognition stage, we proposed to use a
multilayered perceptron (MLP) network trained by
Levenberg-Marquardt algorithm [20] for recognition
and classification. Fig. 3 shows an example of a
MLP network with 3 inputs nodes, x1, x2, x3, 4
hidden nodes and 2 outputs nodes, f1, f2. The output
of the k-th node in the output layer, fk can be
expressed as:
ni 1
f k = ∑ w F ∑ wij xi + b ij
j =1
i =1
x2
(6)
image and ( xt , y t ) the centre of mass of the object.
The affine moment invariants calculated from the
images were then fed into the recognition stage.
2
jk
f1
x3
with f ( x, y ) being the pixel value of the digital
nh
x1
(8)
where wij1 denotes the weights that connect the
input, xi and the hidden layers; w 2jk denotes the
weights that connect the hidden layers and the output
layers; ni and nh are the number of input nodes and
hidden nodes and b1j is the threshold in hidden
layers. F (•) is an activation function that is
normally selected as sigmoid function. For our
recognition purpose, the number of input depends on
the number of cameras times the number of
combination moments while the number of outputs
depends on the number of objects to be recognized.
hidden layer
output layer
Fig. 3: MLP network with 3 inputs, 4 hidden
nodes and 2 outputs
5 Result and Discussion
We chose two types of objects in order to analyze
our system’s performance. Each type consists of
eleven 3D objects. The first type, Type 1, contains
simple and polyhedral 3D shape like cylinder, box,
trapezoid, sphere etc. The second type, Type 2
contains free-form objects. Fig.4 and 5 show these
types of objects.
The MLP network has been trained using 25
hidden nodes and 11 output nodes to represent 11
objects from both types. Table 1 shows the
recognition performance using simple object (Type
1) and Table 2 for free-form objects (Type 2) after
200 iterations.
From all the results presented in Table 1 and 2,
we can see that each affine moment invariant
produced different recognition rates for a same type
of objects. For example, by using objects Type 1, I1
gave the highest accuracy while I2 gave the lowest
accuracy. For objects Type 2, I4 gave the highest
accuracy while I2 gave the lowest accuracy.
Better recognition rates achieved by combining
the affine moments. For objects Type 1, maximum
recognition rate is 100% when combining the six
affine moments. However, for objects Type 2, the
use of 3 affine moments produced better result
compared to the other combinations. Adding more
than 3 affine moments results to confuse the network
and did not improved the recognition rates as
indicated in Table 2.
In our analyses, slightly better recognition rates
were obtained when using the object Type 1
compared to object Type 2. Free-form objects (Type
2) have a complex and asymmetrical shapes
compared to polyhedral objects. This will produced
a rapid change in the moment values and reduced the
features stability when the objects are rotated.
However, the recognition performance for objects
Type 2 was also satisfactory since an accuracy of up
to 98% has been achieved.
6 Conclusion
In this paper, we have presented a new approach
for recognition and classification of 3D objects
using affine moment invariants combined with
multiple views technique and MLP network.
Although this approach only used 2D moments, it
has shown the ability to recognize 3D object well
when adapting with multiple views technique. Since
the calculation of 2D moments is straightforward
compared to 3D moments, it can be used for real
time 3D object recognition system. Furthermore, the
recognition process was not limited to polyhedral
objects but can also be applied arbitrary to any 3D
objects. The results indicate that the proposed
method has successfully classified the 3D objects
with the accuracy of up to 100%.
References:
[1] I. Weiss & M. Ray, Model-Based Recognition
Of 3D Objects From Single Images, IEEE
Trans. on Pattern Analysis and Machine Intell.,
Vol. 23, No. 2, 2001, pp. 116-128.
[2] I. K. Park, K. M. Lee & S. U. Lee, Recognition
and Reconstruction of 3D Objects Using Model
Based Perceptual Grouping, Proc. 15th Int.
Conf. on Pattern Recognition 2000, Vol. 1,
2000, pp. 720-724.
[3] I. Han, I. D. Yun, & S. U. Lee, Model-based
Object Recognition Using the Hausdorff
Distance with Explicit Pairing, Int. Conf. on
Image Processing, ICIP 99, Vol. 4, 1999, pp.
83-87.
[4] U. Büker, & G. Hartmann, Knowledge-Based
View Control of Neural 3D Object Recognition
System,
Proc. of Int. Conf. on Pattern
Recognition. Vol. D, 1996, pp. 24-29.
[5] S. D. Roy, S. Chaudhury & S. Banerjee, Active
Recognition Through Next View Planning: A
Survey, Pattern Recognition, 2003, (Accepted
for Publication).
[6] B. Schiele & J. L. Crowley, Transinformation
for Active Object Recognition. Proc. of the 6th
Int. Conf. Computer Vision, 1998, pp. 249-254.
[7] M. F. S. Farias, & J. M. de Carvalho, Multiview Technique For 3D Polyhedral Object
Recognition Using Surface Representation,
Revista Controle & Automacao, Vol. 10, No. 2,
1999, pp. 107-117.
[8] J. Mao, P. J. Flynn & A. K. Jain, Integration of
Multiple Feature Groups and Multiple Views
into a 3D Object Recognition System,
Computer Vision and Image Understanding,
Vol. 62, No. 3, 1995, pp.309-325.
[9] M. K. Hu, Visual Problem Recognition by
Moment Invariants, IRE Trans. on Info. Theory.
IT-8, 1962, pp. 179-187.
[10] M. R. Teague, Image Analysis via the General
Theory of Moments, J. of Optical Soc. of
America. Vol. 70, 1980, pp. 920-930.
[11] Y. S. Abu-Mostafa & D. Plantis, Recognitive
Aspects of Moment Invariants, IEEE Trans. on
Pattern Anal. & Machine Intell., Vol PAMI-6,
1984, pp. 698-706.
[12] J. Flusser, & T. Suk, Pattern Recognition by
Affine
Moment
Invariants.
Pattern
Recognition. Vol. 26, 1993, pp. 167-174.
[13] S. M. Abdallah, Object Recognition via
Invariance. PhD. Thesis. Univ.of Sydney, 2000.
[14] R. R. Bailey & M. Srinath, Orthogonal Moment
Features for Use With Parametric and NonParametric Classifier. IEEE Trans. on Pattern
Anal. & Machine Intell., Vol. 17, 1996, pp. 75116.
[15] A. Khotanzad & J. H. Lu, Object Recognition
Using a Neural Networks & Invariant Zernike
Features, Proc. IEEE Comp. Soc. on Comp.
Vision & Pattern Recognition., 1989, pp. 200205.
[16] C. H. Lo, & H. S. Don, Pattern Recognition
Using 3D Moments, Proc.10th Int. Conf. on
Pattern Recognition. Vol. 1, 1990, pp. 540-544.
[17] Y. K. Ham, and R. –H. Park, 3D Object
Recognition In Range Images Using Hidden
Markov Models And Neural Networks. Pattern
Recognition, Vol. 32, 1999, pp. 729-742.
[18] T. W. Riddler, & S. Calvard, Picture
Thresholding Using an Iterative Selection
Method. IEEE Trans. on Systems, Man and
Cybernetics.Vol. 8, 1978, pp.630-632.
[19] R. Klette, & P. Zamperoni, Handbook of Image
Processing Operators. John Wiley & Sons,
1996.
[20] M. T. Hagan M. Menhaj, Training Feedforward
Networks with the Marquardt Algorithm. IEEE
Trans. on Neural Networks, Vol. 5 No. 6, 1994,
pp. 989-993.
Fig. 5: Type 2 - free-form object
Fig. 4: Type 1 - simple 3D shape
Table 1: System performance for object Type 1
using affine moment invariants
Affine moment
invariant
I1
I2
I3
No. of
input
nodes
3
Type 1
Training
Testing
(%)
(%)
98.99
98.23
Table 1: System performance for object Type 2
using affine moment invariants
Affine moment
invariant
I1
I2
I3
No. of
input
nodes
3
Type 2
Training
Testing
(%)
(%)
95.20
94.19
3
89.65
77.53
3
94.70
92.17
3
98.74
96.21
95.96
I4
I5
3
81.31
79.04
95.20
94.19
I6
3
83.33
81.06
6
99.75
96.46
6
100
97.22
9
100
98.48
I1 I 2
I1 I 2 I 3
9
100
98.99
I1 I 2 I 3 I 4
12
100
99.24
I1 I 2 I 3 I 4
12
100
97.22
I1 I 2 I 3 I 4 I 5
15
100
99.75
I1 I 2 I 3 I 4 I 5
15
100
96.97
All I’s
18
100
100
All I’s
18
100
98.48
3
88.64
77.53
3
96.97
83.33
3
90.15
86.36
3
99.49
I6
3
I1 I 2
I1 I 2 I 3
I4
I5