Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Journal of Applied Computer Science Methods

The Journal of University of Social Science in Lodz

2 Issues per year

Open Access
See all formats and pricing
More options …

Convergence Analysis of Multilayer Feedforward Networks Trained with Penalty Terms: A Review

Jian Wang / Guoling Yang / Shan Liu / Jacek M. Zurada
  • Department of Electrical and Computer Engineering University of Louisville, Louisville, Kentucky, USA Poland
  • Information Technology Institute, University of Social Sciences, Łódź, Poland
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2016-10-15 | DOI: https://doi.org/10.1515/jacsm-2015-0011


Gradient descent method is one of the popular methods to train feedforward neural networks. Batch and incremental modes are the two most common methods to practically implement the gradient-based training for such networks. Furthermore, since generalization is an important property and quality criterion of a trained network, pruning algorithms with the addition of regularization terms have been widely used as an efficient way to achieve good generalization. In this paper, we review the convergence property and other performance aspects of recently researched training approaches based on different penalization terms. In addition, we show the smoothing approximation tricks when the penalty term is non-differentiable at origin.

Keyword : Gradient; feedforward neural networks; generalization; penalty; convergence; pruning algorithms


  • 1. Hagan M. T., Demuth H. B., Beale M. H., 1996, Neural networks design. Boston ; London: PWS.Google Scholar

  • 2. Haykin S. S., 1999, Neural networks : a comprehensive foundation, 2nd ed. Upper Saddle River, N.J. ; London: Prentice-Hall.Google Scholar

  • 3. Hinton G. E.Salakhutdinov R. R., Jul 2006, Reducing the dimensionality of data with neural networks, Science, Vol. 313, No. 5786, pp. 504-507.Google Scholar

  • 4. LeCun Y., Bengio Y., Hinton G., 05/28/ 2015, Deep learning, Nature, Vol. 521, No. 7553, pp. 436-444.Web of ScienceGoogle Scholar

  • 5. Sutskever I., Hinton G. E., Nov 2008, Deep Narrow Sigmoid Belief Networks Are Universal Approximators, Neural Computation, Vol. 20, No. 11, pp. 2629-2636.CrossrefGoogle Scholar

  • 6. Werbos P. J., 1974, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences, Ph.D., Harvard University, Cambridge, MA.Google Scholar

  • 7. Rumelhart D. E., Hinton G. E., Williams R. J., Oct 9 1986, Learning Representations by Back-Propagating Errors, Nature, Vol. 323, No. 6088, pp. 533-536.Google Scholar

  • 8. Nakama T., Dec 2009, Theoretical analysis of batch and on-line training for gradient descent learning in neural networks, Neurocomputing, Vol. 73, No. 1-3, pp. 151-159.CrossrefGoogle Scholar

  • 9. Reed R., 1993, Pruning algorithms-a survey, Neural Networks, IEEE Transactions on, Vol. 4, No. 5, pp. 740-747.Google Scholar

  • 10. Bishop C. M., 1993, Curvature-driven smoothing: a learning algorithm for feedforward networks, Neural Networks, IEEE Transactions on, Vol. 4, No. 5, pp. 882-884.Google Scholar

  • 11. Wu W., Shao H., Li Z., 2006, Convergence of batch BP algorithm with penalty for FNN training, in Neural Information Processing, pp. 562-569.Google Scholar

  • 12. Zhang H., Wu W., Yao M., 2007, Boundedness of a batch gradient method with penalty for feedforward neural networks, in Proceedings of the 12th WSEAS International Conference on Applied Mathematics, pp. 175-178.Google Scholar

  • 13. Zhang H., Wu W., 2009, Boundedness and convergence of online gradient method with penalty for linear output feedforward neural networks, Neural Process Lett, Vol. 29, No. 3, pp. 205-212.CrossrefGoogle Scholar

  • 14. Zhang H., Wu W., Liu F., Yao M., 2009, Boundedness and convergence of online gradient method with penalty for feedforward neural networks, Neural Networks, IEEE Transactions on, Vol. 20, No. 6, pp. 1050-1054.Google Scholar

  • 15. Shao H., Zheng G., 2011, Boundedness and convergence of online gradient method with penalty and momentum, Neurocomputing, Vol. 74, No. 5, pp. 765-770.CrossrefGoogle Scholar

  • 16. Wang J., Wu W., Zurada J. M., 2012, Computational properties and convergence analysis of BPNN for cyclic and almost cyclic learning with penalty, Neural Networks, Vol. 33, pp. 127-135.Google Scholar

  • 17. Yu X., Chen Q., 2012, Convergence of gradient method with penalty for Ridge Polynomial neural network, Neurocomputing, Vol. 97, pp. 405-409.Google Scholar

  • 18. Fan Q., Zurada J. M., Wu W., 2014, Convergence of online gradient method for feedforward neural networks with smoothing L 1/2 regularization penalty, Neurocomputing, Vol. 131, pp. 208-216.Google Scholar

  • 19. Wu W., Fan Q., Zurada J. M., Wang J., Yang D., Liu Y., 2014, Batch gradient method with smoothing L1/2 regularization for training of feedforward neural networks, Neural Networks, Vol. 50, pp. 72-78.Google Scholar

  • 20. Leung C. S., Tsoi A.-C., Chan L. W., 2001, Two regularizers for recursive least squared algorithms in feedforward multilayered neural networks, Neural Networks, IEEE Transactions on, Vol. 12, No. 6, pp. 1314-1332.Google Scholar

  • 21. Sum J., Chi-Sing L., Ho K., 2012, Convergence Analyses on On-Line Weight Noise Injection-Based Training Algorithms for MLPs, Neural Networks and Learning Systems, IEEE Transactions on, Vol. 23, No. 11, pp. 1827-1840. Google Scholar

  • 22. Sum J. P., Chi-Sing L., Ho K. I. J., 2012, On-Line Node Fault Injection Training Algorithm for MLP Networks: Objective Function and Convergence Analysis, Neural Networks and Learning Systems, IEEE Transactions on, Vol. 23, No. 2, pp. 211-222.Google Scholar

  • 23. Weigend A. S., Rumelhart D. E., Huberman B., 1991, Generalization by weightelimination applied to currency exchange rate prediction, in Neural Networks, IJCNN 1991 International Joint Conference on, Seattle, pp. 837-841.Google Scholar

  • 24. Weigend A. S.Rumelhart D. E., 1992, Generalization through minimal networks with application to forecasting: Defense Technical Information Center.Google Scholar

  • 25. Rakitianskaia A., Engelbrecht A., 2014, Weight regularization in particle swarm optimization neural network training, in Swarm Intelligence (SIS), 2014 IEEE Symposium on, pp. 1-8.Google Scholar

  • 26. Thomas P., Suhner M. C., 2015, A new multilayer perceptron pruning algorithm for classification and regression applications, Neural Process Lett, pp. 1-22.Web of ScienceGoogle Scholar

  • 27. Xu Z., Zhang H., Wang Y., Chang X., 2010, L(1/2) regularization, Science China- Information Sciences, Vol. 53, No. 6, pp. 1159-1165.Google Scholar

  • 28. Xu Z., Chang X., Xu F., Zhang H., 2012, L(1/2) Regularization: A Thresholding Representation Theory and a Fast Solver, Neural Networks and Learning Systems, IEEE Transactions on, Vol. 23, No. 7, pp. 1013-1027.Google Scholar

  • 29. Yuan M., Lin Y., 2006, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 68, pp. 49-67. Google Scholar

About the article

Published Online: 2016-10-15

Published in Print: 2015-11-01

Citation Information: Journal of Applied Computer Science Methods, Volume 7, Issue 2, Pages 89–103, ISSN (Online) 2391-8241, DOI: https://doi.org/10.1515/jacsm-2015-0011.

Export Citation

© 2016. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Beatriz Pérez-Sánchez, Oscar Fontenla-Romero, and Bertha Guijarro-Berdiñas
Artificial Intelligence Review, 2016

Comments (0)

Please log in or register to comment.
Log in