Academia.eduAcademia.edu
3D Object Recognition Using Multiple Views, Affine Moment Invariants and Multilayered Perceptron Network M. K. OSMAN , M. Y. MASHOR, M. R. ARSHAD Control and Electronic Intelligent System (CELIS) Research Group, School of Electrical & Electronic Eng., Univ. Sains Malaysia, Eng. Campus, 14300 Nibong Tebal, Seb. Perai Selatan, Pulau Pinang, Malaysia. Abstract: - This paper addresses a performance analysis of affine moment invariants for 3D object recognition. Affine moment invariants are commonly used as shape feature for 2D object or pattern recognition. The current study proved that with some adaptation to multiple views technique, affine moments are sufficient to model 3D objects. In addition, the simplicity of moments calculation reduces the processing time for feature extraction, hence increases the system efficiency. In the recognition stage, we proposed to use multilayered perceptron (MLP) network trained by Levenberg-Marquardt algorithm for matching and classification. The proposed method has been tested using two groups of object, polyhedral and free-form objects. The experimental results show that affine moment invariants combined with MLP network attain the best performance in both recognitions, polyhedral and free-form objects. Key-Words: - Computer vision, multiple views technique, moment invariants, 3D object recognition, neural networks 1 Introduction In computer vision, the process of recognition typically involves some sort of sensors, a model database which contains all the information about the objects representations and a decision making or categorization. Sensor e.g. charge-coupled device (CCD) camera is used to gather images and information from a scene of interest. Then the digitized image is processed to represent it in the same way as the models are represented in the database. Finally, a recognition algorithm is applied to find the model to which the object best matches. The method is also known as model-based object recognition system, the most common system for shape or object recognition. Most model based 3D object recognition systems use information from a single view of an object [1][2][3]. However, a single view may not contain sufficient features to recognize it unambiguously. It is due to the image of a 3D object that depends on such factor such as the camera viewpoint and the viewing geometry. A single view-based approach may not be applicable for 3D object recognition since only one side of an object can be seen from any given viewpoint [4]. One solution to this problem is to use information from several views of the object. There has been considerable research on active object recognition system [5][6], where the camera moved around the object to gather additional views until there is enough evidence to obtain a sufficient level of confidence in an object hypothesis. However, the system requires a complicated and expensive setup that is difficult to achieve [6]. To overcome the problems, some works proposed a method on combining the evidence from static cameras [7][8]. In this paper, we proposed a system that used multiple views technique from static cameras. Generally, objects can be recognized by their shape, visual cues such as color, texture, characteristic motion, their location relatives to other objects in the scene, context information and expectation. Our work will focus on the recognition of the isolated objects using shape information. The proposed system was not limited to only polyhedral objects but also considered objects with free-form shape. Moment functions of the two-dimensional image intensity distribution or 2D moments have been widely used over the years in a variety of applications as descriptors of shape. Hu [9] first introduced moment invariants based on methods of algebraic invariants. Later, various methods of computing 2D moment invariants have been proposed [10][11]. The Hu moments are invariant under changes in translation, rotation and scale but not under general 2D affine transformation. Affine moment invariants were introduced in [12] to address the problem. Most of the works that have been carried out to date using 2D moments are concerned with the identification and recognition of 2D objects [13][14][15]. For 3D object recognition, 3D moments [16] are used for shape representation. However, the computational complexity of 3D moments will increase the processing time, hence reduce the system efficiency. In this paper, we used affine moment invariants [12] for 3D shape representation. Although affine moment invariants are commonly applied to 2D object, an adaptation with multiple views technique enables this technique to be used for 3D object recognition. The simplicity of 2D moment calculation will also reduce the processing time and suitable for real time computer vision system. Recently, most researchers are focusing on applying neural network for 3D object recognition. Compared to the conventional 3D object recognition, neural networks provide a more general and parallel implementation paradigm [17]. In this work, we proposed to use a neural network called multilayered perceptron (MLP) network to perform recognition task. is used to build the 3D object model in the recognition stage. Fig. 1: Image acquisition set-up 2 Image Acquisition In this section, a proposed methodology for camera-object setup will be discussed. Each object to be recognized is placed in its stable condition at the centre of the turntable. The turntable is a circular horizontal platform that can be rotated 360 degree. A, B and C represent the coordinate for the cameras around the turntable. Point A and B are located on the same horizontal plane, but be vary 900 from each other. Point C is perpendicular to the turntable. Fig. 1 shows the location of the points and object. Since all points have the same distance from the centre of the turntable, all cameras must have the same focal lengths. The cameras at point A and B are proposed to be fixed at 450 from perpendicular view and camera C is at the top of the object. Fig. 2 shows how the cameras are positioned. After an object of interest is placed at the centre of the turntable, the object’s images are acquired. Then, the object will be rotated 50 and the same process is repeated. Each rotation will rotate the object for an angle of 50, and so on until 3600 is completed. Hence, for each object, we will have 72 image sets. These images are divided into two groups, 36 image sets for training data and 36 image sets for testing data. For training data, we considered images with 00, 100, 200,…3500 condition, and the rest of the images (image with 50, 150, 250…3550 condition) are used for testing. The training data set Fig. 2: Camera position for point A, B and C 3 Image Processing and Feature Extraction Captured images are then digitized and sent to the image processing and feature extraction stage. In the image processing stage, images are thresholded automatically using iterative thresholding method [18]. This method leads to a good separation between object and background in several applications [19]. In feature extraction stage, we choose Affine moment invariants as features for 3D object modeling. The six affine moment invariants used are defined below: I1 = I2 = 1 µ 1 µ 10 00 4 00 ( µ 20 µ 02 − µ112 ) (1) ( µ 302 µ 032 − 6 µ 30 µ 21 µ12 µ 03 + 4 µ 30 µ123 3 2 + 4µ 03 µ 21 − 3µ 21 µ122 ) (2) I3 = I4 = 1 ( µ 20 ( µ 21 µ 03 − µ122 ) − µ11 ( µ 30 µ 03 − µ µ 21 µ12 ) + µ 02 ( µ 30 µ12 − µ 212 )) 7 00 1 (3) 3 ( µ 20 µ 032 − 6µ 202 µ11 µ12 µ 03 − µ 2 6µ µ 21 µ 02 µ 03 + 9 µ 20 µ 02 µ122 + 12µ 20 µ112 µ 03 µ 21 + 6 µ 20 µ11 µ 02 µ 30 µ 03 − 18µ 20 µ11 µ 02 µ 21 µ12 − 2 8µ113 µ 03 µ 30 − 6 µ 20 µ 022 µ 30 µ12 + 9 µ 20 µ 022 µ 21 2 + 12 µ112 µ 02 µ 30 µ12 − 6 µ112 µ 02 µ 30 µ 21 + µ 023 µ 302 ) (4) 11 00 In other word, each output of the network represents one type of object. Inputs to the networks were assigned as in Table 1 and Table 2 respectively. In the recognition step, any output node that has the largest value is determined as 1. Otherwise, the node is considered as 0. 2 20 I5 = 1 µ 04 2 ( µ 40 − 4 µ 31 µ13 + 3µ 22 ) 6 00 1 ( µ 40 µ 04 µ 22 + 2 µ 31 µ 22 µ13 µ 009 3 ) − µ 40 µ132 − µ 04 µ 312 − µ 22 where µ pq is defined by: I6 = (5) µ pq = ∫∫ object f ( x, y )( x − xt ) p ( y − yt ) q dxdy f2 input layer (7) 4 Recognition and Classification In the recognition stage, we proposed to use a multilayered perceptron (MLP) network trained by Levenberg-Marquardt algorithm [20] for recognition and classification. Fig. 3 shows an example of a MLP network with 3 inputs nodes, x1, x2, x3, 4 hidden nodes and 2 outputs nodes, f1, f2. The output of the k-th node in the output layer, fk can be expressed as:   ni 1 f k = ∑ w F  ∑ wij xi + b ij  j =1   i =1 x2 (6) image and ( xt , y t ) the centre of mass of the object. The affine moment invariants calculated from the images were then fed into the recognition stage. 2 jk f1 x3 with f ( x, y ) being the pixel value of the digital nh x1 (8) where wij1 denotes the weights that connect the input, xi and the hidden layers; w 2jk denotes the weights that connect the hidden layers and the output layers; ni and nh are the number of input nodes and hidden nodes and b1j is the threshold in hidden layers. F (•) is an activation function that is normally selected as sigmoid function. For our recognition purpose, the number of input depends on the number of cameras times the number of combination moments while the number of outputs depends on the number of objects to be recognized. hidden layer output layer Fig. 3: MLP network with 3 inputs, 4 hidden nodes and 2 outputs 5 Result and Discussion We chose two types of objects in order to analyze our system’s performance. Each type consists of eleven 3D objects. The first type, Type 1, contains simple and polyhedral 3D shape like cylinder, box, trapezoid, sphere etc. The second type, Type 2 contains free-form objects. Fig.4 and 5 show these types of objects. The MLP network has been trained using 25 hidden nodes and 11 output nodes to represent 11 objects from both types. Table 1 shows the recognition performance using simple object (Type 1) and Table 2 for free-form objects (Type 2) after 200 iterations. From all the results presented in Table 1 and 2, we can see that each affine moment invariant produced different recognition rates for a same type of objects. For example, by using objects Type 1, I1 gave the highest accuracy while I2 gave the lowest accuracy. For objects Type 2, I4 gave the highest accuracy while I2 gave the lowest accuracy. Better recognition rates achieved by combining the affine moments. For objects Type 1, maximum recognition rate is 100% when combining the six affine moments. However, for objects Type 2, the use of 3 affine moments produced better result compared to the other combinations. Adding more than 3 affine moments results to confuse the network and did not improved the recognition rates as indicated in Table 2. In our analyses, slightly better recognition rates were obtained when using the object Type 1 compared to object Type 2. Free-form objects (Type 2) have a complex and asymmetrical shapes compared to polyhedral objects. This will produced a rapid change in the moment values and reduced the features stability when the objects are rotated. However, the recognition performance for objects Type 2 was also satisfactory since an accuracy of up to 98% has been achieved. 6 Conclusion In this paper, we have presented a new approach for recognition and classification of 3D objects using affine moment invariants combined with multiple views technique and MLP network. Although this approach only used 2D moments, it has shown the ability to recognize 3D object well when adapting with multiple views technique. Since the calculation of 2D moments is straightforward compared to 3D moments, it can be used for real time 3D object recognition system. Furthermore, the recognition process was not limited to polyhedral objects but can also be applied arbitrary to any 3D objects. The results indicate that the proposed method has successfully classified the 3D objects with the accuracy of up to 100%. References: [1] I. Weiss & M. Ray, Model-Based Recognition Of 3D Objects From Single Images, IEEE Trans. on Pattern Analysis and Machine Intell., Vol. 23, No. 2, 2001, pp. 116-128. [2] I. K. Park, K. M. Lee & S. U. Lee, Recognition and Reconstruction of 3D Objects Using Model Based Perceptual Grouping, Proc. 15th Int. Conf. on Pattern Recognition 2000, Vol. 1, 2000, pp. 720-724. [3] I. Han, I. D. Yun, & S. U. Lee, Model-based Object Recognition Using the Hausdorff Distance with Explicit Pairing, Int. Conf. on Image Processing, ICIP 99, Vol. 4, 1999, pp. 83-87. [4] U. Büker, & G. Hartmann, Knowledge-Based View Control of Neural 3D Object Recognition System, Proc. of Int. Conf. on Pattern Recognition. Vol. D, 1996, pp. 24-29. [5] S. D. Roy, S. Chaudhury & S. Banerjee, Active Recognition Through Next View Planning: A Survey, Pattern Recognition, 2003, (Accepted for Publication). [6] B. Schiele & J. L. Crowley, Transinformation for Active Object Recognition. Proc. of the 6th Int. Conf. Computer Vision, 1998, pp. 249-254. [7] M. F. S. Farias, & J. M. de Carvalho, Multiview Technique For 3D Polyhedral Object Recognition Using Surface Representation, Revista Controle & Automacao, Vol. 10, No. 2, 1999, pp. 107-117. [8] J. Mao, P. J. Flynn & A. K. Jain, Integration of Multiple Feature Groups and Multiple Views into a 3D Object Recognition System, Computer Vision and Image Understanding, Vol. 62, No. 3, 1995, pp.309-325. [9] M. K. Hu, Visual Problem Recognition by Moment Invariants, IRE Trans. on Info. Theory. IT-8, 1962, pp. 179-187. [10] M. R. Teague, Image Analysis via the General Theory of Moments, J. of Optical Soc. of America. Vol. 70, 1980, pp. 920-930. [11] Y. S. Abu-Mostafa & D. Plantis, Recognitive Aspects of Moment Invariants, IEEE Trans. on Pattern Anal. & Machine Intell., Vol PAMI-6, 1984, pp. 698-706. [12] J. Flusser, & T. Suk, Pattern Recognition by Affine Moment Invariants. Pattern Recognition. Vol. 26, 1993, pp. 167-174. [13] S. M. Abdallah, Object Recognition via Invariance. PhD. Thesis. Univ.of Sydney, 2000. [14] R. R. Bailey & M. Srinath, Orthogonal Moment Features for Use With Parametric and NonParametric Classifier. IEEE Trans. on Pattern Anal. & Machine Intell., Vol. 17, 1996, pp. 75116. [15] A. Khotanzad & J. H. Lu, Object Recognition Using a Neural Networks & Invariant Zernike Features, Proc. IEEE Comp. Soc. on Comp. Vision & Pattern Recognition., 1989, pp. 200205. [16] C. H. Lo, & H. S. Don, Pattern Recognition Using 3D Moments, Proc.10th Int. Conf. on Pattern Recognition. Vol. 1, 1990, pp. 540-544. [17] Y. K. Ham, and R. –H. Park, 3D Object Recognition In Range Images Using Hidden Markov Models And Neural Networks. Pattern Recognition, Vol. 32, 1999, pp. 729-742. [18] T. W. Riddler, & S. Calvard, Picture Thresholding Using an Iterative Selection Method. IEEE Trans. on Systems, Man and Cybernetics.Vol. 8, 1978, pp.630-632. [19] R. Klette, & P. Zamperoni, Handbook of Image Processing Operators. John Wiley & Sons, 1996. [20] M. T. Hagan M. Menhaj, Training Feedforward Networks with the Marquardt Algorithm. IEEE Trans. on Neural Networks, Vol. 5 No. 6, 1994, pp. 989-993. Fig. 5: Type 2 - free-form object Fig. 4: Type 1 - simple 3D shape Table 1: System performance for object Type 1 using affine moment invariants Affine moment invariant I1 I2 I3 No. of input nodes 3 Type 1 Training Testing (%) (%) 98.99 98.23 Table 1: System performance for object Type 2 using affine moment invariants Affine moment invariant I1 I2 I3 No. of input nodes 3 Type 2 Training Testing (%) (%) 95.20 94.19 3 89.65 77.53 3 94.70 92.17 3 98.74 96.21 95.96 I4 I5 3 81.31 79.04 95.20 94.19 I6 3 83.33 81.06 6 99.75 96.46 6 100 97.22 9 100 98.48 I1 I 2 I1 I 2 I 3 9 100 98.99 I1 I 2 I 3 I 4 12 100 99.24 I1 I 2 I 3 I 4 12 100 97.22 I1 I 2 I 3 I 4 I 5 15 100 99.75 I1 I 2 I 3 I 4 I 5 15 100 96.97 All I’s 18 100 100 All I’s 18 100 98.48 3 88.64 77.53 3 96.97 83.33 3 90.15 86.36 3 99.49 I6 3 I1 I 2 I1 I 2 I 3 I4 I5