The technology of formal quantitative estimation of the conformity of mathematical models to the available dataset is presented. The main purpose of the technology is to make the model selection decision-making process easier for the researcher. The method is a combination of approaches from the areas of data analysis, optimization and distributed computing including: cross-validation and regularization methods, algebraic modeling in optimization and methods of optimization, automatic discretization of differential and integral equations, and optimization REST-services. The technology is illustrated by a demo case study. A general mathematical formulation of the method is presented. It is followed by a description of the main aspects of algorithmic and software implementation. The list of success stories of the presented approach is substantial. Nevertheless, the domain of applicability and important unresolved issues are discussed.
If the inline PDF is not rendering correctly, you can download the PDF file here.
 A.V. Sokolov and V.V. Voloshinov. Choice of mathematical model: balance between complexity and proximity to measurements. International Journal of Open Information Technologies, 6(9), 2018.
 A.N. Tikhonov. On mathematical methods for automating the processing of observations. In Problems of Computational Mathematics, pages 3–17, 1980.
 R. Kohavi et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai, volume 14, pages 1137–1145. Montreal, Canada, 1995.
 T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning: data mining, inference and prediction. Springer, 2 edition, 2009.
 M. Kuhn and K. Johnson. Applied predictive modeling, volume 26. Springer, 2013.
 A.I. Rozhenko. Theory and Algorithms of Variational Spline-Approximations. Novosibirsk State Technical University, 2005. (in Russian).
 W. Härdle. Applied nonparametric regression. Number 19. Cambridge university press, 1990.
 V. Strijov and G.-W. Weber. Nonlinear regression model generation using hyperparameter optimization. Computers & Mathematics with Applications, 60(4):981–988, 2010.
 O. Sysoev and O. Burdakov. A smoothed monotonic regression via L2 regularization. Knowledge and Information Systems, 59(1):197–218, 2019.
 S. Dempe. Foundations of bilevel programming. Springer Science & Business Media, 2002.
 B.N. Pshenichnyi and A.A. Sosnovsky. The linearization method: Principal concepts and perspective directions. Journal of Global Optimization, 3(4):483–500, 1993.
 S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1):1–122, 2011.
 S. Smirnov, V. Voloshinov, and O. Sukhoroslov. Distributed optimization on the base of AMPL modeling language and Everest platform. Procedia Computer Science, 101:313–322, 2016.
 S. Smirnov and V. Voloshinov. On domain decomposition strategies to parallelize branch-and-bound method for global optimization in Everest distributed environment. Procedia Computer Science, 136:128–135, 2018.
 O. Sukhoroslov, S. Volkov, and A. Afanasiev. A web-based platform for publication and distributed execution of computing applications. In Parallel and Distributed Computing (ISPDC), 2015 14th International Symposium on, pages 175–184, June 2015.
 R. Fourer, D.M. Gay, and B.W. Kernighan. AMPL: A Modeling Language for Mathematical Programming. Second edition. Duxbury Press/Brooks/Cole Publishing Company, 2003. https://ampl.com/resources/the-ampl-book.
 W.E. Hart, C.D. Laird, J.P. Watson, D.L. Woodruff, G.A. Hackebeil, B.L. Nicholson, and J.D. Siirola. Pyomo–optimization modeling in Python. 2nd edition, volume 67. Springer, 2017.
 A. Forrester, A. Sobester, and A. Keane. Engineering design via surrogate modelling: a practical guide. John Wiley & Sons, 2008.
 A. Wächter and L.T. Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical programming, 106(1):25–57, 2006.
 A. Gleixner, M. Bastubbe, L. Eifler, T. Gally, G. Gamrath, R. L. Gottwald, G. Hendel, C. Hojny, T. Koch, M. E. Lübbecke, S. J. Maher, M. Miltenberger, et al. The SCIP Optimization Suite 6.0. Technical Report 18-26, ZIB, Takustr. 7, 14195 Berlin, 2018.
 A.V. Sokolov, V.K. Bolondinsky, and V.V. Voloshinov. Technologies for constructing mathematical models from experimental data: applying the method of balanced identification using the example of choosing a pine transpiration model. In National Supercomputer Fjrum (NSCF-2018), 2018.
 A.V. Sokolov, V.V. Mamkin, V.K. Avilov, D.L. Tarasov, Y.A. Kurbatova, and A. V. Olchev. Application of a balanced identification method for gap-filling in CO2 flux data in a sphagnum peat bog. Computer Research and Modeling, 11(1):153–171, 2019.
 Yu.E. Lavruhin, A.V. Sokolov, and D.S. Grozdov. Monitoring of volume activity in the atmospheric surface layer based on the testimony of the spectrometer seg-017: error analysis. In Radioactivity after nuclear explosions and accidents: consequences and ways to overcome, pages 359–368, 2016.
 V.G. Linnik, A.V. Sokolov, and I.V. Mironenko. 137cs patterns and their transformation in landscapes of the opolye of the bryansk region. Modern trends in the development of biogeochemistry, pages 423–434, 2016.
 A.V. Sokolov, A.A. Sokolov, and Hervé Delbarre. Method of balanced identification in the inverse problem of transport and diffusion of atmospheric pollution. In EGU2019-15175, volume 26, 2019.
 A.V. Sokolov and L.A. Sokolova. Building mathematical models: quantifying the significance of accepted hypotheses and used data. In XXI International Conference on Computational Mechanics and Modern Applied Software Systems (CMMASS’2019), pages 114–115, 2019.
 A.P. Afanasiev, V.V. Voloshinov, and A.V. Sokolov. Inverse problem in the modeling on the basis of regularization and distributed computing in the Everest environment. In CEUR Workshop Proceedings, pages 100–108, 2017.
 A.B. Kukushkin, A.A. Kulichenko, P.A. Sdvizhenskii, A.V. Sokolov, and V.V. Voloshinov. A model of recovering parameters of fast non-local heat transport in magnetic fusion plasma. Problems of Atomic Science and Technology, Ser. Thermonuclear Fusion, 40(1):45–55, 2017.
 A.V. Sokolov. Mechanisms of regulation of the speed of evolution: The population level. Biophysics, 61(3):513–520, 2016.
 Y. Shinano, T. Achterberg, T. Berthold, S. Heinz, and T. Koch. ParaSCIP: a parallel extension of SCIP. In Competence in High Performance Computing 2010, pages 135–148. Springer, 2011.
 Y. Shinano, S. Heinz, S. Vigerske, and M. Winkler. FiberSCIP – a shared memory parallelization of SCIP. INFORMS Journal on Computing, 30(1):11–30, 2017.
 B. Nicholson, J.D. Siirola, J.-P. Watson, V.M. Zavala, and L.T. Biegler. pyomo.dae: a modeling and automatic discretization framework for optimization with differential and algebraic equations. Mathematical Programming Computation, 10(2):187–223, 2018.
 C. Chen and O.L. Mangasarian. A class of smoothing functions for nonlinear and mixed complementarity problems. Computational Optimization and Applications, 5(2):97–138, 1996.
 Z. Zhou and Y. Peng. The locally Chen–Harker–Kanzow–Smale smoothing functions for mixed complementarity problems. Journal of Global Optimization, 74(1):169–193, 2019.
 A.T. Fuller. Relay control systems optimized for various performance criteria. volume 1, pages 520–529. Elsevier, 1960.
Open Computer Science is an open access, peer-reviewed journal. The journal publishes research results in the following fields: algorithms and complexity theory, artificial intelligence, bioinformatics, networking and security systems, programming languages, system and software engineering, and theoretical foundations of computer science.