Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Oldenbourg March 4, 2020

Optimization frameworks for machine learning: Examples and case study

Joachim Giesen, Sören Laue and Matthias Mitterreiter

Abstract

Mathematical optimization is at the algorithmic core of machine learning. Almost any known algorithm for solving mathematical optimization problems has been applied in machine learning and the machine learning community itself is actively designing and implementing new algorithms for specific problems. These implementations have to be made available to machine learning practitioners which is mostly accomplished by distributing them as standalone software. Successful well-engineered implementations are collected in machine learning toolboxes that provide a more uniform access to the different solvers. A disadvantage of the toolbox approach is a lack of flexibility as toolboxes only provide access to a fixed set of machine learning models that cannot be modified. This can be a problem for the typical machine learning workflow that iterates the process of modeling, solving and validating. If a model does not perform well on validation data, it needs to be modified. In most cases these modifications require a new solver for the entailed optimization problems. Optimization frameworks that combine a modeling language for specifying optimization problems with a solver are better suited to the iterative workflow since they allow to address large problem classes. Here, we provide examples of the use of optimization frameworks in machine learning. We also illustrate the use of one such framework in a case study that follows the typical machine learning workflow.

ACM CCS:

Funding source: Deutsche Forschungsgemeinschaft

Award Identifier / Grant number: LA-2971/1-1

Funding source: Deutsche Forschungsgemeinschaft

Award Identifier / Grant number: GI-711/5-1

Funding statement: Sören Laue acknowledges funding from DFG grant LA-2971/1-1 for work on the basic GENO framework. Joachim Giesen, Sören Laue, and Matthias Mitterreiter acknowledge funding from DFG grant GI-711/5-1 for scaling up GENO to be used within parallel and distributed computing environments.

References

1. Christopher M. Bishop. Pattern Recognition and Machine Learning, 5th Edition. Information science and statistics. Springer, 2007.Search in Google Scholar

2. Trevor Hastie, Robert Tibshirani and Jerome H. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition. Springer Series in Statistics. Springer, 2009.10.1007/978-0-387-84858-7Search in Google Scholar

3. Kevin P. Murphy. Machine Learning – A Probabilistic Perspective. Adaptive computation and machine learning series. MIT Press, 2012.Search in Google Scholar

4. David R. Cox. The regression analysis of binary sequences (with discussion). J. Roy. Stat. Soc. B, 20:215–242, 1958.10.1111/j.2517-6161.1958.tb00292.xSearch in Google Scholar

5. Bernhard Scholkopf, Ralf Herbrich and Alex J. Smola. A generalized representer theorem. In International Conference on Computational Learning Theory (COLT), 2001.10.1007/3-540-44581-1_27Search in Google Scholar

6. Suvrit Sra, Sebastian Nowozin and Stephen J. Wright Optimization for Machine Learning. MIT Press, 2012.Search in Google Scholar

7. Herbert Robbins and Sutton Monro. A stochastic approximation method. Ann. Math. Statist., 22(3):400–407, 1951.10.1007/978-1-4612-5110-1_9Search in Google Scholar

8. L. Bottou, F. Curtis and J. Nocedal. Optimization Methods for Large-Scale Machine Learning. SIAM Review, 60(2):223–311, 2018.10.1137/16M1080173Search in Google Scholar

9. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.Search in Google Scholar

10. Eibe Frank, Mark A. Hall and Ian H. Witten. The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”. Morgan Kaufmann, fourth edition, 2016.Search in Google Scholar

11. Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia and Ameet Talwalkar. Mllib: Machine learning in apache spark. Journal of Machine Learning Research, 17(1), January 2016.Search in Google Scholar

12. Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011.10.1145/1961189.1961199Search in Google Scholar

13. Jerome H. Friedman, Trevor Hastie and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22, 2010.10.18637/jss.v033.i01Search in Google Scholar

14. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu and Xiaoqiang Zheng. TensorFlow: A system for large-scale machine learning. In USENIX Conference on Operating Systems Design and Implementation (OSDI), pages 265–283, 2016.Search in Google Scholar

15. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga and Adam Lerer. Automatic differentiation in pytorch. In NIPS Autodiff workshop, 2017.Search in Google Scholar

16. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.Search in Google Scholar

17. Andreas Griewank and Andrea Walther. Evaluating derivatives – principles and techniques of algorithmic differentiation. SIAM, 2008.10.1137/1.9780898717761Search in Google Scholar

18. B. T. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1–17, 1964.10.1016/0041-5553(64)90137-5Search in Google Scholar

19. Yurii Nesterov. A method for unconstrained convex minimization problem with the rate of convergence O(1/k2). Doklady AN USSR (translated as Soviet Math. Docl.), 269, 1983.Search in Google Scholar

20. Robert Fourer, David M. Gay and Brian W. Kernighan. AMPL: a modeling language for mathematical programming. Thomson/Brooks/Cole, 2003.Search in Google Scholar

21. A. Brooke, D. Kendrick and A. Meeraus. GAMS: release 2.25: a user’s guide. The Scientific press series. Scientific Press, 1992.Search in Google Scholar

22. Iain Dunning, Joey Huchette and Miles Lubin. JuMP: A modeling language for mathematical optimization. SIAM Review, 59(2):295–320, 2017.10.1137/15M1020575Search in Google Scholar

23. CVX Research, Inc. CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx, December 2018.Search in Google Scholar

24. M. Grant and S. Boyd. Graph implementations for nonsmooth convex programs. In V. Blondel, S. Boyd and H. Kimura, editors, Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences, pages 95–110. 2008.10.1007/978-1-84800-155-8_7Search in Google Scholar

25. Akshay Agrawal, Robin Verschueren, Steven Diamond and Stephen Boyd. A rewriting system for convex optimization problems. Journal of Control and Decision, 5(1):42–60, 2018.10.1080/23307706.2017.1397554Search in Google Scholar

26. Steven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research, 17(83):1–5, 2016.Search in Google Scholar

27. Jacob Mattingley and Stephen Boyd. CVXGEN: A Code Generator for Embedded Convex Optimization. Optimization and Engineering, 13(1):1–27, 2012.10.1007/s11081-011-9176-9Search in Google Scholar

28. P. Giselsson and S. Boyd. Linear convergence and metric selection for Douglas-Rachford splitting and ADMM. IEEE Transactions on Automatic Control, 62(2):532–544, Feb. 2017.10.1109/TAC.2016.2564160Search in Google Scholar

29. Goran Banjac, Bartolomeo Stellato, Nicholas Moehle, Paul Goulart, Alberto Bemporad and Stephen P. Boyd. Embedded code generation using the OSQP solver. In Conference on Decision and Control, (CDC), pages 1906–1911, 2017.10.1109/CDC.2017.8263928Search in Google Scholar

30. Sören Laue, Matthias Mitterreiter and Joachim Giesen. GENO – GENeric Optimization for Classical Machine Learning. In Advances in Neural Information Processing Systems (NeurIPS), 2019.Search in Google Scholar

31. Sören Laue, Matthias Mitterreiter and Joachim Giesen. Computing higher order derivatives of matrix and tensor expressions. In Advances in Neural Information Processing Systems (NeurIPS), 2018.Search in Google Scholar

32. Sören Laue, Matthias Mitterreiter and Joachim Giesen. A Simple and Efficient Tensor Calculus. In AAAI Conference on Artificial Intelligence (AAAI), 2020.10.1609/aaai.v34i04.5881Search in Google Scholar

33. Yurii Nesterov. Smooth minimization of non-smooth functions. Math. Program., 103(1):127–152, 2005.10.1007/s10107-004-0552-5Search in Google Scholar

34. Richard H. Byrd, Peihuang Lu, Jorge Nocedal and Ciyou Zhu. A limited memory algorithm for bound constrained optimization. SIAM J. Scientific Computing, 16(5):1190–1208, 1995.10.2172/204262Search in Google Scholar

35. Ciyou Zhu, Richard H. Byrd, Peihuang Lu and Jorge Nocedal. Algorithm 778: L-BFGS-B: fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw., 23(4):550–560, 1997.10.1145/279232.279236Search in Google Scholar

36. José Luis Morales and Jorge Nocedal. Remark on “algorithm 778: L-BFGS-B: fortran subroutines for large-scale bound constrained optimization”. ACM Trans. Math. Softw., 38(1):7:1–7:4, 2011.10.1145/2049662.2049669Search in Google Scholar

37. Magnus R. Hestenes. Multiplier and gradient methods. Journal of Optimization Theory and Applications, 4(5):303–320, 1969.10.1007/BF00927673Search in Google Scholar

38. M. J. D. Powell. Algorithms for nonlinear constraints that use Lagrangian functions. Mathematical Programming, 14(1):224–248, 1969.10.1007/BF01588967Search in Google Scholar

39. Ernesto G. Birgin and José Mario Martínez. Practical augmented Lagrangian methods for constrained optimization, volume 10 of Fundamentals of Algorithms. SIAM, 2014.10.1137/1.9781611973365Search in Google Scholar

Received: 2019-09-04
Revised: 2020-02-17
Accepted: 2020-02-19
Published Online: 2020-03-04
Published in Print: 2020-05-27

© 2020 Walter de Gruyter GmbH, Berlin/Boston