“Pose Refinement for Reflective Workpieces using Deep Iterative Matching”

Authors: Ozan Kaya, Kevin Thieu, Christian Holden and Olav Egeland,
Affiliation: NTNU
Reference: 2025, Vol 46, No 3, pp. 123-135.

Keywords: Pose estimation, Machine learning, Reflective metals, Rendering, Eye-to-hand

Abstract: Determination of the pose of workpieces is important for robotic applications in manufacturing, including handling, assembly, machining and welding. Established methods based on 3D sensors may fail for workpieces with highly reflective materials. In this paper, we take advantage of recent development in machine learning to determine the pose of reflective workpieces without the use of depth data. Our proposed method is based on deep iterative matching of image data of the workpiece with a computer-aided design model. Starting with an initial estimate of the workpiece pose, the method iteratively aligns the computer-aided design model projections with an image of the actual workpiece, adjusting the pose until computer-aided design model matches the image of the workpiece. The deep learning-based approach optimizes this alignment by updating the pose estimate at each iteration, achieving high precision even for geometrically complex or reflective surfaces. This refinement process enhances accuracy in robotic applications where precise workpiece positioning is critical, such as in automated welding and assembly tasks. We use photorealistic rendering to create two datasets for pretraining the network, which reduces both training time and the need for real labeled data. After the network is trained on synthetic data, it is fine-tuned and tested on real images of reflective aluminium workpieces. We show that the proposed deep iterative matching method outperforms established methods based on iterative closest point with two 3D scanners due to large errors in the scans caused by reflections.

PDF PDF (6072 Kb)        DOI: 10.4173/mic.2025.3.3

References:
[1] Alstad, O. and Egeland, O. (2022). Elimination of Reflections in Laser Scanning Systems with Convolutional Neural Networks, Modeling, Identification and Control. 43(1):9--20. doi:10.4173/mic.2022.1.2
[2] Bhat, D.N. and Nayar, S.K. (1998). Stereo and specular reflection, International Journal of Computer Vision. 26:91--106. doi:10.1023/A:1007940725322
[3] Blais, F. (2004). Review of 20 years of range sensor development, Journal of Electronic Imaging. 13(1):231--243. doi:10.1117/1.1631921
[4] Bradski, G., Kaehler, A., etal. (2000). Opencv, Dr. Dobb’s journal of software tools. 3(2).
[5] Community, B.O. (2018). Blender - A 3D modelling and rendering package, Blender Foundation, Stichting Blender Foundation, Amsterdam. http://www.blender.org.
[6] Duda, A. and Frese, U. (2018). Accurate detection and localization of checkerboard corners for calibration, In BMVC, volume 126. pages 1--11.
[7] Fan, Y. and Zhao, B. (2015). Combined non-contact coordinate measurement system and calibration method, Optics & Laser Technology. 70:100--105. doi:10.1016/j.optlastec.2015.01.001
[8] Fischler, M.A. and Bolles, R.C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Communications of the ACM. 24(6):381--395. doi:10.1145/358669.358692
[9] Flandin, G., Chaumette, F., and Marchand, E. (2000). Eye-in-hand/eye-to-hand cooperation for visual servoing, In Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), volume3. IEEE, pages 2741--2746. doi:10.1109/ROBOT.2000.846442
[10] Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., and Lepetit, V. (2011). Gradient response maps for real-time detection of textureless objects, IEEE transactions on pattern analysis and machine intelligence. 34(5):876--888. doi:10.1109/TPAMI.2011.206
[11] Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., and Navab, N. (2013). Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes, In Computer Vision--ACCV 2012: 11th Asian Conference on Computer Vision, Daejeon, Korea, November 5-9, 2012, Revised Selected Papers, Part I 11. Springer, pages 548--562. doi:10.1007/978-3-642-37331-2_42
[12] Hodan, T., Barath, D., and Matas, J. (2020). Epos: Estimating 6d pose of objects with symmetries, In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pages 11703--11712. doi:10.48550/arXiv.2004.00605
[13] Hodan, T., Haluza, P., Obdrzalek, S., Matas, J., Lourakis, M., and Zabulis, X. (2017). T-less: An RGB-D dataset for 6D pose estimation of texture-less objects, In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pages 880--888. doi:10.1109/WACV.2017.103
[14] Hodan, T., Sundermeyer, M., Drost, B., Labbe, Y., Brachmann, E., Michel, F., Rother, C., and Matas, J. (2020). Bop challenge 2020 on 6d object localization, In Computer Vision--ECCV 2020 Workshops: Glasgow, UK, August 23--28, 2020, Proceedings, Part II 16. Springer, pages 577--594. doi:10.1007/978-3-030-66096-3_39
[15] Jiang, Y., Huang, Z., Yang, B., and Yang, W. (2022). A review of robotic assembly strategies for the full operation procedure: planning, execution and evaluation, Robotics and Computer-Integrated Manufacturing. 78:102366. doi:10.1016/j.rcim.2022.102366
[16] Jurie, F. and Dhome, M. (2001). Real time 3d template matching, In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, volume1. IEEE, pages I--I. doi:10.1109/CVPR.2001.990559
[17] Kakish, J., Zhang, P.-L., and Zeid, I. (2000). Towards the design and development of a knowledge-based universal modular jigs and fixtures system, Journal of Intelligent Manufacturing. 11:381--401. doi:10.1023/A:1008978319436
[18] Kaya, O., Tağlıoğlu, G.B., and Ertuğrul, S. (2021). The series elastic gripper design, object detection, and recognition by touch, Journal of Mechanisms and Robotics. 14(1):014501. doi:10.1115/1.4051520
[19] Kehl, W., Manhardt, F., Tombari, F., Ilic, S., and Navab, N. (2017). Ssd-6d: Making rgb-based 3d detection and 6d pose estimation great again, In Proceedings of the IEEE international conference on computer vision. pages 1521--1529. doi:10.1109/ICCV.2017.169
[20] Kendall, A. and Cipolla, R. (2017). Geometric loss functions for camera pose regression with deep learning, In Proceedings of the IEEE conference on computer vision and pattern recognition. pages 5974--5983. doi:10.48550/arXiv.1704.00390
[21] Keselman, L., IselinWoodfill, J., Grunnet-Jepsen, A., and Bhowmik, A. (2017). Intel realsense stereoscopic depth cameras, In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pages 1--10. doi:10.1109/CVPRW.2017.167
[22] Kingma, D.P. (2014). Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980. doi:10.48550/arXiv.1412.6980
[23] Klingenberg, K., Johannessen, K.V., Kaya, O., and Tingelstad, L. (2024). Industrial camera-aided trajectory planning for robotic welding on reflective surfaces, In 2024 IEEE International Conference on Real-time Computing and Robotics (RCAR). pages 182--187. doi:10.1109/RCAR61438.2024.10671320
[24] Labbe, Y., Carpentier, J., Aubry, M., and Sivic, J. (2020). Cosypose: Consistent multi-view multi-object 6d pose estimation, In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XVII 16. Springer, pages 574--591. doi:10.1007/978-3-030-58520-4_34
[25] Li, Y., Wang, G., Ji, X., Xiang, Y., and Fox, D. (2018). Deepim: Deep iterative matching for 6d pose estimation, In Proceedings of the European Conference on Computer Vision (ECCV). pages 683--698. doi:10.48550/arXiv.1804.00175
[26] Li, Z., Wang, G., and Ji, X. (2019). Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation, In Proceedings of the IEEE/CVF international conference on computer vision. pages 7678--7687. doi:10.1109/ICCV.2019.00777
[27] Litvak, Y., Biess, A., and Bar-Hillel, A. (2019). Learning pose estimation for high-precision robotic assembly using simulated depth images, In 2019 International Conference on Robotics and Automation (ICRA). IEEE, pages 3521--3527. doi:10.1109/ICRA.2019.8794226
[28] Lourenco, F. and Araujo, H. (2021). Intel realsense sr305, d415 and l515: Experimental evaluation and comparison of depth estimation, In VISIGRAPP (4: VISAPP). pages 362--369. doi:10.5220/0010254203620369
[29] Lowe, D.G. (1999). Object recognition from local scale-invariant features, In Proceedings of the seventh IEEE international conference on computer vision, volume2. Ieee, pages 1150--1157. doi:10.1109/ICCV.1999.790410
[30] Marco-Rider, J., Cibicik, A., and Egeland, O. (2022). Polarization image laser line extraction methods for reflective metal surfaces, IEEE Sensors Journal. 22(18):18114--18129. doi:10.1109/JSEN.2022.3194258
[31] Njaastad, E.B. and Egeland, O. (2016). Automatic touch-up of welding paths using 3d vision, IFAC-PapersOnLine. 49(31):73--78. doi:10.1016/j.ifacol.2016.12.164
[32] Park, K., Patten, T., and Vincze, M. (2019). Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation, In Proceedings of the IEEE/CVF international conference on computer vision. pages 7668--7677. doi:10.1109/ICCV.2019.00776
[33] Polyhaven. (2024). Hdri: Forest trail, https://polyhaven.com/. Accessed: 2024-11-22.
[34] Qin, W., Hu, Q., Zhuang, Z., Huang, H., Zhu, X., and Han, L. (2023). Ippe-pcr: a novel 6d pose estimation method based on point cloud repair for texture-less and occluded industrial parts, Journal of Intelligent Manufacturing. 34(6):2797--2807. doi:10.1007/s10845-022-01965-6
[35] Rad, M. and Lepetit, V. (2017). Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth, In Proceedings of the IEEE international conference on computer vision. pages 3828--3836. doi:10.1109/ICCV.2017.413
[36] Ren, G., Qu, X., and Chen, X. (2020). Performance evaluation and compensation method of trigger probes in measurement based on the abbé principle, Sensors. 20(8). doi:10.3390/s20082413
[37] Rothganger, F., Lazebnik, S., Schmid, C., and Ponce, J. (2006). 3D object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints, International Journal of Computer Vision. 66:231--259. doi:10.1007/s11263-005-3674-1
[38] Rusinkiewicz, S. and Levoy, M. (2001). Efficient variants of the icp algorithm, In Proceedings third international conference on 3-D digital imaging and modeling. IEEE, pages 145--152. doi:10.1109/IM.2001.924423
[39] Schleth, G., Kuss, A., and Kraus, W. (2018). Workpiece localization methods for robotic welding-a review, In ISR 2018; 50th International Symposium on Robotics. VDE, pages 1--6.
[40] Shen, J., Yoon, D., Shehu, D., and Chang, S.-Y. (2009). Spectral moving removal of non-isolated surface outlier clusters, Computer-Aided Design. 41(4):256--267. doi:10.1016/j.cad.2008.09.003
[41] Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., and Fitzgibbon, A. (2013). Scene coordinate regression forests for camera relocalization in rgb-d images, In Proceedings of the IEEE conference on computer vision and pattern recognition. pages 2930--2937. doi:10.1109/CVPR.2013.377
[42] Simonelli, A., Bulo, S.R., Porzi, L., Lopez-Antequera, M., and Kontschieder, P. (2019). Disentangling monocular 3d object detection, In Proceedings of the IEEE/CVF International Conference on Computer Vision. pages 1991--1999. doi:10.48550/arXiv.1905.12365
[43] Tan, M. and Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks, In International conference on machine learning. PMLR, pages 6105--6114. doi:10.48550/arXiv.1905.11946
[44] Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P. (2017). Domain randomization for transferring deep neural networks from simulation to the real world, In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pages 23--30. doi:10.1109/IROS.2017.8202133
[45] Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., and Xiao, J. (2015). 3d shapenets: A deep representation for volumetric shapes, In Proceedings of the IEEE conference on computer vision and pattern recognition. pages 1912--1920. doi:10.1109/CVPR.2015.7298801
[46] Yigit, C.B., Bayraktar, E., Kaya, O., and Boyraz, P. (2021). External force/torque estimation with only position sensors for antagonistic vsas, IEEE Transactions on Robotics. 37(2):675--682. doi:10.1109/TRO.2020.3031268
[47] Zhou, Q.-Y., Park, J., and Koltun, V. (2018). Open3d: A modern library for 3d data processing, arXiv preprint arXiv:1801.09847. doi:10.48550/arXiv.1801.09847
[48] Zhou, Y., Barnes, C., Lu, J., Yang, J., and Li, H. (2019). On the continuity of rotation representations in neural networks, In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pages 5745--5753. doi:10.1109/CVPR.2019.00589


BibTeX:
@article{MIC-2025-3-3,
  title={{Pose Refinement for Reflective Workpieces using Deep Iterative Matching}},
  author={Kaya, Ozan and Thieu, Kevin and Holden, Christian and Egeland, Olav},
  journal={Modeling, Identification and Control},
  volume={46},
  number={3},
  pages={123--135},
  year={2025},
  doi={10.4173/mic.2025.3.3},
  publisher={Norwegian Society of Automatic Control}
};