Skip to content
Licensed Unlicensed Requires Authentication Published by Oldenbourg Wissenschaftsverlag September 7, 2019

A comparison of shape-based matching with deep-learning-based object detection

Ein Vergleich von Form-basiertem Matching mit Deep-Learning-basierter Objektdetektion
  • Markus Ulrich

    Markus Ulrich studied geodesy and remote sensing at the Technische Universität München (TUM) and received his PhD degree from TUM in 2003. In 2003, he joined MVTec’s Research and Development Department as a software engineer and became head of the research team in 2008. He has authored and coauthored scientific publications in the fields of photogrammetry and machine vision. He is also a guest lecturer at TUM, where he teaches close-range photogrammetry. In 2017, he was appointed a lecturer (Privatdozent) at the Karlsruhe Institute of Technology in the field of machine vision.

    ORCID logo EMAIL logo
    , Patrick Follmann

    Patrick Follmann studied Mathematics in Bioscience at the Technische Universität München (TUM) and received his MSc degree in 2015. He is currently working towards the PhD degree at the research department of MVTec Software GmbH. His research interests spread between the areas of machine learning and computer vision, with special focus on image classification, object detection and instance-aware semantic segmentation.

    ORCID logo EMAIL logo
    and Jan-Hendrik Neudeck

    Jan-Hendrik Neudeck studies the master program Robotics, Cognition, Intelligence at the Technische Universität München (TUM) and is currently writing his master thesis at MVTec’s Research Department. He has put the focus of his studies particularly on the application of machine learning for computer vision problems.

From the journal tm - Technisches Messen

Abstract

Matching, i. e. determining the exact 2D pose (e. g., position and orientation) of objects, is still one of the key tasks in machine vision applications like robot navigation, measuring, or grasping an object. There are many classic approaches for matching, based on edges or on the pure gray values of the template. In recent years, deep learning has been utilized mainly for more difficult tasks where the objects of interest are from many different categories with high intra-class variations and classic algorithms are failing. In this work, we compare one of the latest deep-learning-based object detectors with classic shape-based matching. We evaluate the methods both on a matching dataset as well as an object detection dataset that contains rigid objects and is thus also suitable for shape-based matching. We show that for datasets of this type, where rigid objects appear with rigid transformations, shape-based matching still outperforms recent object detectors regarding runtime, robustness, and precision if only a single template image per object is used. On the other hand, we show that for the application of object detection, the deep-learning-based approach outperforms the classic approach if annotated data is used for training. Ultimately, the choice of the best suited approach depends on the conditions and requirements of the application.

Zusammenfassung

Matching, d. h. die Bestimmung der genauen 2D-Lage (z. B. Position und Orientierung) von Objekten, ist nach wie vor eine der Kernaufgaben der industriellen Bildverarbeitung und insbesondere wichtig für die Roboternavigation, die Objektvermessung oder das Greifen von Objekten. Viele klassische Matching-Ansätze verwenden für die Lagebestimmung Grauwertkanten oder die Grauwerte selbst. In den letzten Jahren wurden verstärkt Deep-Learning-Ansätze für anspruchsvolle Aufgaben eingesetzt: Insbesondere wenn gleichzeitig viele Objekte unterschiedlicher Klassen oder Objekte mit starken Variationen innerhalb einer Klasse erkannt werden müssen, stoßen klassische Algorithmen an ihre Grenzen. In dieser Arbeit vergleichen wir einen aktuellen Deep-Learning-basierten Ansatz zur Objektdetektion mit dem klassischen formbasierten Matching. Für die Evaluierung verwenden wir sowohl einen Datensatz, der eine klassische Matching-Aufgabe repräsentiert als auch einen Datensatz zur Objektdetektion. Der Datensatz zur Objektdetektion enthält starre Objekte und eignet sich somit auch für das formbasierte Matching. Wir zeigen, dass bei Anwendungen, bei denen starre Objekte unter starren Transformationen erkannt werden müssen, das formbasierte Matching immer noch den aktuellen Verfahren zur Objektdetektion in Bezug auf Laufzeit, Robustheit und Präzision überlegen ist, insbesondere wenn nur ein einzelnes Beispielbild pro Objektklasse zur Verfügung steht. Andererseits zeigen die Ergebnisse für die Objektdetektion, dass der Deep-Learning-basierte Ansatz den klassischen Ansatz übertrifft wenn augmentierte Trainingsdaten verwendet werden. Letztendlich hängt die Wahl des am besten geeigneten Ansatzes von den Bedingungen und Anforderungen der jeweiligen Anwendung ab.

About the authors

Markus Ulrich

Markus Ulrich studied geodesy and remote sensing at the Technische Universität München (TUM) and received his PhD degree from TUM in 2003. In 2003, he joined MVTec’s Research and Development Department as a software engineer and became head of the research team in 2008. He has authored and coauthored scientific publications in the fields of photogrammetry and machine vision. He is also a guest lecturer at TUM, where he teaches close-range photogrammetry. In 2017, he was appointed a lecturer (Privatdozent) at the Karlsruhe Institute of Technology in the field of machine vision.

Patrick Follmann

Patrick Follmann studied Mathematics in Bioscience at the Technische Universität München (TUM) and received his MSc degree in 2015. He is currently working towards the PhD degree at the research department of MVTec Software GmbH. His research interests spread between the areas of machine learning and computer vision, with special focus on image classification, object detection and instance-aware semantic segmentation.

Jan-Hendrik Neudeck

Jan-Hendrik Neudeck studies the master program Robotics, Cognition, Intelligence at the Technische Universität München (TUM) and is currently writing his master thesis at MVTec’s Research Department. He has put the focus of his studies particularly on the application of machine learning for computer vision problems.

References

1. Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3213–3223, 2016.10.1109/CVPR.2016.350Search in Google Scholar

2. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei, ImageNet Large Scale Visual Recognition Challenge, International Journal of Computer Vision (IJCV), 115(3):211–252, 2015.10.1007/s11263-015-0816-ySearch in Google Scholar

3. Patrick Follmann, Tobias Böttger, Philipp Härtinger, Rebecca König, and Markus Ulrich. MVTec D2S: densely segmented supermarket dataset. In Proceedings of the European Conference on Computer Vision (ECCV), pages 569–585, 2018.10.1007/978-3-030-01249-6_35Search in Google Scholar

4. Patrick Follmann, Bertram Drost, and Tobias Böttger. Acquire, augment, segment and enjoy: Weakly supervised instance segmentation of supermarket products. In Thomas Brox, Andrés Bruhn, and Mario Fritz, editors, Pattern Recognition, pages 363–376, Cham, 2019. Springer International Publishing.10.1007/978-3-030-12939-2_25Search in Google Scholar

5. Patrick Follmann and Bernd Radig. Detecting animals in infrared images from camera-traps. Pattern Recognition and Image Analysis, 28(4):605–611, 2018.10.1134/S1054661818040107Search in Google Scholar

6. Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 580–587, 2014.10.1109/CVPR.2014.81Search in Google Scholar

7. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.Search in Google Scholar

8. Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf, William J Dally, and Kurt Keutzer. Squeezenet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv preprint arXiv:1602.07360, 2016.Search in Google Scholar

9. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1097–1105, 2012.Search in Google Scholar

10. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2117–2125, 2017.Search in Google Scholar

11. Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Coomputer Vision, pages 2980–2988, 2017.Search in Google Scholar

12. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision, pages 740–755. Springer, 2014.10.1007/978-3-319-10602-1_48Search in Google Scholar

13. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. SSD: Single shot multibox detector. In European Conference on Computer Vision, pages 21–37. Springer, 2016.10.1007/978-3-319-46448-0_2Search in Google Scholar

14. Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015.10.1109/CVPR.2015.7298965Search in Google Scholar

15. MVTec Software GmbH. HALCON/HDevelop Operator Reference, Version 19.05, https://www.mvtec.com/products/halcon, 2019.Search in Google Scholar

16. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 779–788, 2016.10.1109/CVPR.2016.91Search in Google Scholar

17. Joseph Redmon and Ali Farhadi. YOLO9000: Better, faster, stronger. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7263–7271, July 2017.10.1109/CVPR.2017.690Search in Google Scholar

18. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, pages 91–99, 2015.Search in Google Scholar

19. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 234–241. Springer, 2015.10.1007/978-3-319-24574-4_28Search in Google Scholar

20. C Siegfarth, T Voegtle, and C Fabinski. Comparison of two methods for 2D pose estimation of industrial workpieces in images-CNN vs. classical image processing system. International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences, 42(1), 2018.10.5194/isprs-archives-XLII-1-401-2018Search in Google Scholar

21. Carsten Steger. Similarity measures for occlusion, clutter, and illumination invariant object recognition. In Bernd Radig and Stefan Florczyk, editors, Pattern Recognition, volume 2191 of Lecture Notes in Computer Science, pages 148–154, Berlin, 2001. Springer-Verlag.10.1007/3-540-45404-7_20Search in Google Scholar

22. Carsten Steger. Occlusion, clutter, and illumination invariant object recognition. In International Archives of Photogrammetry and Remote Sensing, volume XXXIV, part 3A, pages 345–350, 2002.Search in Google Scholar

23. Carsten Steger, Markus Ulrich, and Christian Wiedemann. Machine Vision Algorithms and Applications. Wiley-VCH, Weinheim, 2nd edition, 2018.Search in Google Scholar

24. Markus Ulrich. Hierarchical Real-Time Recognition of Compound Objects in Images, volume 569 of Reihe C. Deutsche Geodätische Kommission bei der Bayerischen Akademie der Wissenschaften, München, 2003.Search in Google Scholar

25. Markus Ulrich and Carsten Steger. Empirical performance evaluation of object recognition methods. In H. I. Christensen and P. J. Phillips, editors, Empirical Evaluation Methods in Computer Vision, pages 62–76, Los Alamitos, CA, 2001. IEEE Computer Society Press.Search in Google Scholar

26. Markus Ulrich and Carsten Steger. Performance comparison of 2D object recognition techniques. In International Archives of Photogrammetry and Remote Sensing, volume XXXIV, part 3A, pages 368–374, 2002.Search in Google Scholar

Received: 2019-05-27
Accepted: 2019-08-30
Published Online: 2019-09-07
Published in Print: 2019-11-26

© 2019 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 10.12.2023 from https://www.degruyter.com/document/doi/10.1515/teme-2019-0076/pdf
Scroll to top button