Accessible Requires Authentication Published by Oldenbourg Wissenschaftsverlag May 4, 2021

Comparing optimization methods for deep learning in image processing applications

Vergleich von Optimierungsmethoden für Deep Learning in Bildverarbeitungsanwendungen
Alexander Geng, Ali Moghiseh, Claudia Redenbach and Katja Schladitz
From the journal tm - Technisches Messen


Training a deep learning network requires choosing its weights such that the output minimizes a given loss function. In practice, stochastic gradient descent is frequently used for solving the optimization problem. Several variants of this approach have been suggested in the literature. We study the impact of the choice of the optimization method on the outcome of the learning process at the example of two image processing applications from quite different fields. The first one is artistic style transfer, where the content of one image is combined with the style of another one. The second application is a real world classification task from industry, namely detecting defects in images of air filters. In both cases, clear differences between the results of the individual optimization methods are observed.


Während des Trainings eines neuronalen Netzes werden die Gewichte des Netzes so bestimmt, dass die Ausgabe des Netzes die verwendete Verlustfunktion minimiert. Zur Lösung dieses Optimierungsproblems wird in der Praxis häufig das stochastische Gradientenabstiegsverfahren verwendet. In der Literatur finden sich diverse Varianten dieses Ansatzes. In der vorliegenden Arbeit untersuchen wir den Einfluss der Wahl der Optimierungsmethode auf das Training neuronaler Netze. Wir betrachten dazu zwei sehr unterschiedliche Anwendungen aus der Bildverarbeitung. Bei der ersten handelt es sich um die künstlerische Stilübertragung, bei der der Inhalt eines Bildes im Stil eines zweiten Bildes dargestellt wird. Als zweite Anwendung betrachten wir ein Klassifikationsproblem aus der industriellen Anwendung, nämlich die Detektion von Defekten in Bildern von Luftfiltern. In beiden Anwendungsbeispielen beobachten wir klare Unterschiede zwischen neuronalen Netzen, die unter Verwendung verschiedener Optimierungsmethoden trainiert wurden.

Funding source: Fraunhofer-Gesellschaft

Funding statement: This research was supported by the Fraunhofer FLAGSHIP PROJECT ML4P.


1. H. Robbins and S. Monro, “A stochastic approximation method,” The Annals of Mathematical Statistics, pp. 400–407, 1951. Search in Google Scholar

2. J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” Journal of Machine Learning Research, vol. 12, no. 7, 2011. Search in Google Scholar

3. M. D. Zeiler, “Adadelta: An adaptive learning rate method,” arXiv preprint arXiv:1212.5701, 2012. Search in Google Scholar

4. M. Riedmiller and H. Braun, “A direct adaptive method for faster backpropagation learning: The RProp algorithm,” in IEEE International Conference on Neural Networks, pp. 586–591, IEEE, 1993. Search in Google Scholar

5. T. Tieleman and G. Hinton, “Lecture 6.5-RMSProp: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural Networks for Machine Learning, vol. 4, no. 2, pp. 26–31, 2012. Search in Google Scholar

6. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014. Search in Google Scholar

7. S. J. Reddi, S. Kale, and S. Kumar, “On the convergence of Adam and beyond,” arXiv preprint arXiv:1904.09237, 2019. Search in Google Scholar

8. T. Dozat, “Incorporating Nesterov momentum into Adam,” in ICLR Workshop, 2016. Search in Google Scholar

9. A. Botev, G. Lever, and D. Barber, “Nesterov’s accelerated gradient and momentum as approximations to regularised update descent,” in 2017 International Joint Conference on Neural Networks (IJCNN), pp. 1899–1903, IEEE, 2017. Search in Google Scholar

10. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “PyTorch: An imperative style, high-performance deep learning library,” arXiv preprint arXiv:1912.01703, 2019. Search in Google Scholar

11. F. Chollet et al., “Keras.” GitHub repository,, 2015. Search in Google Scholar

12. L. A. Gatys, A. S. Ecker, and M. Bethge, “A neural algorithm of artistic style,” arXiv preprint arXiv:1508.06576, 2015. Search in Google Scholar

13. L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2414–2423, 2016. Search in Google Scholar

14. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European Conference on Computer Vision, pp. 694–711, Springer, 2016. Search in Google Scholar

15. V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in ICML, 2010. Search in Google Scholar

16. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016. Search in Google Scholar

17. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. Search in Google Scholar

18. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015. Search in Google Scholar

19. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in European Conference on Computer Vision, pp. 740–755, Springer, 2014. Search in Google Scholar

20. V. van Gogh, “The Starry Night.” Museum of Modern Art, New York, 1889. [accessed: September 20, 2020]. Search in Google Scholar

21. G. H. Golub and C. F. Van Loan, Matrix Computations, vol. 3. JHU Press, 2013. Search in Google Scholar

22. P. Saini, “StyleTransferApp.” GitHub repository,, 2019. Search in Google Scholar

23. P. la Quiete, “Torre pendente di Pisa.” [Accessed: September 20, 2020]. Search in Google Scholar

24. Colourbox, “Nahtlose bunte Muster, Stock-Vektor.” [Accessed: September 20, 2020]. Search in Google Scholar

25. Fraunhofer Institute for Industrial Mathematics, “ToolIP – tool for image processing.” [Accessed: August 12, 2020]. Search in Google Scholar

26. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241, Springer, 2015. Search in Google Scholar

27. P. Yakubovskiy, “Segmentation models.” GitHub repository,, 2019. Search in Google Scholar

28. W. Zhu, Y. Huang, L. Zeng, X. Chen, Y. Liu, Z. Qian, N. Du, W. Fan, and X. Xie, “AnatomyNet: Deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy,” Medical Physics, vol. 46, no. 2, pp. 576–589, 2019. Search in Google Scholar

29. T. A. Sorensen, “A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on danish commons,” Biol. Skar., vol. 5, pp. 1–34, 1948. Search in Google Scholar

30. L. R. Dice, “Measures of the amount of ecologic association between species,” Ecology, vol. 26, no. 3, pp. 297–302, 1945. Search in Google Scholar

31. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988, 2017. Search in Google Scholar

32. J. Nocedal and S. Wright, Numerical Optimization. Springer Science & Business Media, 2006. Search in Google Scholar

33. D. C. Liu and J. Nocedal, “On the limited memory BFGS method for large scale optimization,” Mathematical Programming, vol. 45, no. 1, pp. 503–528, 1989. Search in Google Scholar

34. J. Martens, “Deep learning via Hessian-free optimization,” in ICML, vol. 27, pp. 735–742, 2010. Search in Google Scholar

35. D. P. Bertsekas, “Nonlinear programming,” Journal of the Operational Research Society, vol. 48, no. 3, pp. 334, 1997. Search in Google Scholar

Received: 2021-02-15
Accepted: 2021-04-22
Published Online: 2021-05-04
Published in Print: 2021-08-27

© 2021 Walter de Gruyter GmbH, Berlin/Boston