Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Dependence Modeling

Ed. by Puccetti, Giovanni

Covered by:

CiteScore 2018: 0.67

SCImago Journal Rank (SJR) 2018: 0.380
Source Normalized Impact per Paper (SNIP) 2018: 0.383

Open Access
See all formats and pricing
More options …

Exponential inequalities for nonstationary Markov chains

Pierre Alquier / Paul Doukhan / Xiequan Fan
Published Online: 2019-06-03 | DOI: https://doi.org/10.1515/demo-2019-0007


Exponential inequalities are main tools in machine learning theory. To prove exponential inequalities for non i.i.d random variables allows to extend many learning techniques to these variables. Indeed, much work has been done both on inequalities and learning theory for time series, in the past 15 years. However, for the non independent case, almost all the results concern stationary time series. This excludes many important applications: for example any series with a periodic behaviour is nonstationary. In this paper, we extend the basic tools of [19] to nonstationary Markov chains. As an application, we provide a Bernsteintype inequality, and we deduce risk bounds for the prediction of periodic autoregressive processes with an unknown period.

Keywords: Nonstationary Markov chains; Martingales; Exponential inequalities; Time series forecasting; Statistical learning theory; Oracle inequalities; Model selection

MSC 2010: 60J05; 60E15; 62M20; 62M05; 62M10; 68Q32


  • [1] Adamczak, R. (2008). A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electron. J. Probab. 13(34), 1000–1034.Web of ScienceGoogle Scholar

  • [2] Alquier, P. and B. Guedj (2018). Simpler PAC-Bayesian bounds for hostile data. Mach. Learn. 107(5), 887–902.Web of ScienceGoogle Scholar

  • [3] Alquier, P., X. Li, and O. Wintenberger (2013). Prediction of time series by statistical learning: general losses and fast rates. Depend. Model. 1, 65–93.Google Scholar

  • [4] Alquier, P. and O. Wintenberger (2012). Model selection for weakly dependent time series forecasting. Bernoulli 18(3), 883–913.CrossrefWeb of ScienceGoogle Scholar

  • [5] Bardet, J.-M. and P. Doukhan (2018). Non-parametric estimation of time varying AR(1) processes with local stationarity and periodicity. Electron. J. Stat. 12(2), 2323–2354.CrossrefGoogle Scholar

  • [6] Baudry, J.-P., C.Maugis, and B. Michel (2012). Slope heuristics: overview and implementation. Stat. Comput. 22(2), 455–470.Web of ScienceCrossrefGoogle Scholar

  • [7] Bercu, B., B. Delyon, and E. Rio (2015). Concentration Inequalities for Sums and Martingales. Springer, Cham.Web of ScienceGoogle Scholar

  • [8] Bertail, P. and G. Ciołek (2019). New Bernstein and Hoeffding type inequalities for regenerative Markov chains. ALEA Lat. Am. J. Probab. Math. Stat. 16(1), 259–277.Google Scholar

  • [9] Bertail, P. and S. Clémençon (2010). Sharp bounds for the tails of functionals of Markov chains. Theory Probab. Appl. 54(3), 505–515.Google Scholar

  • [10] Bertail, P. and F. Portier (2019). Rademacher complexity forMarkov chains: Applications to kernel smoothing and Metropolis- Hasting. Bernoulli, to appear. Available at http://www.bernoulli-society.org/index.php/publications/bernoulli-journal/bernoulli-journal-papers.

  • [11] Birgé, L. and P. Massart (2007). Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138(1-2), 33–73.Google Scholar

  • [12] Blanchard, G. and O. Zadorozhnyi (2017). Concentration of weakly dependent Banach-valued sums and applications to kernel learning methods. Available at https://arxiv.org/abs/1712.01934v1.

  • [13] Boucheron, S., G. Lugosi, and P. Massart (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press.Google Scholar

  • [14] Catoni, O. (2003). Laplace transform estimates and deviation inequalities. Ann. Inst. Henri Poincaré Probab. Stat. 39(1), 1–26.Google Scholar

  • [15] Collet, P., S. Martinez, and B. Schmitt (2002). Exponential inequalities for dynamical measures of expanding maps of the interval. Probab. Theory Related Fields 123(3), 301–322.Google Scholar

  • [16] Dahlhaus, R. (1996). On the Kullback-Leibler information divergence of locally stationary processes. Stochastic Process. Appl. 62(1), 139–168.Google Scholar

  • [17] de la Peña, V. H. (1999). A general class of exponential inequalities formartingales and ratios. Ann. Probab. 27(1), 537–564.Google Scholar

  • [18] Dedecker, J., P. Doukhan, G. Lang, J. R. León, S. Louhichi, and C. Prieur (2007). Weak Dependence: With Examples and Applications. Springer, New York.Google Scholar

  • [19] Dedecker, J. and X. Fan (2015). Deviation inequalities for separately Lipschitz functionals of iterated random functions. Stochastic Process. Appl. 125(1), 60–90.Google Scholar

  • [20] Diaconis, P. and D. Freedman (1999). Iterated random functions. SIAM Rev. 41(1), 45–76.CrossrefGoogle Scholar

  • [21] Doukhan, P. (2018). Stochastic Models for Time Series. Springer, Cham.Google Scholar

  • [22] Doukhan, P. and M. H. Neumann (2007). Probability and moment inequalities for sums of weakly dependent random variables, with applications. Stochastic Process. Appl. 117(7), 878–903.Google Scholar

  • [23] Fan, J., B. Jiang, and Q. Sun (2018). Hoeffding’s lemma forMarkov chains and its applications to statistical learning. Available at https://arxiv.org/abs/1802.00211.

  • [24] Hang, H. and I. Steinwart (2014). Fast learning from α-mixing observations. J. Multivariate Anal. 127, 184–199.Web of ScienceGoogle Scholar

  • [25] Hang, H. and I. Steinwart (2017). A Bernstein-type inequality for some mixing processes and dynamical systems with an application to learning. Ann. Statist. 45(2), 708–743.CrossrefGoogle Scholar

  • [26] Joulin, A. and Y. Ollivier (2010). Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Probab. 38(6), 2418–2442.Web of ScienceCrossrefGoogle Scholar

  • [27] Kontorovich, L. A. and K. Ramanan (2008). Concentration inequalities for dependent random variables via the martingale method. Ann. Probab. 36(6), 2126–2158.Web of ScienceCrossrefGoogle Scholar

  • [28] Kuznetsov, V. and M. Mohri (2015). Learning theory and algorithms for forecasting non-stationary time series. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.), Advances in Neural Information Processing Systems, pp. 541–549.Google Scholar

  • [29] Lerasle, M. (2011). Optimal model selection for density estimation of stationary data under various mixing conditions. Ann. Statist. 39(4), 1852–1877.CrossrefGoogle Scholar

  • [30] London, B., B. Huang, B. Taskar, and L. Getoor (2014). PAC-Bayesian collective stability. In S. Kaski and J. Corander (Eds.), Artificial Intelligence and Statistics, pp. 585–594.Google Scholar

  • [31] Massart, P. (2007). Concentration Inequalities and Model Selection. Springer, Berlin.Google Scholar

  • [32] McDonald, D. J., C. R. Shalizi, and M. Schervish (2017). Nonparametric risk bounds for time-series forecasting. J. Mach. Learn. Res. 18, no. 32, 40 pp.Google Scholar

  • [33] Meir, R. (2000). Nonparametric time series prediction through adaptive model selection. Mach. Learn. 39(1), 5–34.Google Scholar

  • [34] Merlevède, F., M. Peligrad, and E. Rio (2009). Bernstein inequality and moderate deviations under strong mixing conditions. In C. Houdré, V. Koltchinskii, D. M. Mason, and M. Peligrad (Eds.), High Dimensional Probability V: The Luminy Volume, pp. 273–292. Institute of Mathematical Statistics, Beachwood OH.Google Scholar

  • [35] Merlevède, F., M. Peligrad, and E. Rio (2011). A Bernstein type inequality and moderate deviations for weakly dependent sequences. Probab. Theory Related Fields 151(3–4), 435–474.Google Scholar

  • [36] Modha, D. S. and E. Masry (1996). Minimum complexity regression estimation with weakly dependent observations. IEEE Trans. Inform. Theory 42(6), 2133–2145.CrossrefGoogle Scholar

  • [37] Paulin, D. (2015). Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electron. J. Probab. 20, no. 79, 32 pp.Google Scholar

  • [38] Rio, E. (2013a). Extensions of the Hoeffding-Azuma inequalities. Electron. Commun. Probab. 18, no. 54, 6 pp.Google Scholar

  • [39] Rio, E. (2013b). On McDiarmid’s concentration inequality. Electron. Commun. Probab. 18, no. 44, 11 pp.Google Scholar

  • [40] Rio, E. (2017). Asymptotic Theory of Weakly Dependent Random Processes. Springer, Berlin.Google Scholar

  • [41] Samson, P.-M. (2000). Concentration of measure inequalities for Markov chains and ϕ-mixing processes. Ann. Probab. 28(1), 416–461.CrossrefGoogle Scholar

  • [42] Sanchez-Perez, A. (2015). Time series prediction via aggregation: an oracle bound including numerical cost. In A. Antoniadis, J.-M. Poggi, and X. Brossat (Eds.), Modeling and Stochastic Learning for Forecasting in High Dimensions, pp. 243–265. Springer, Cham.Google Scholar

  • [43] Seldin, Y., F. Laviolette, N. Cesa-Bianchi, J. Shawe-Taylor, and P. Auer (2012). Pac-Bayesian inequalities formartingales. IEEE Trans. Inform. Theory 58(12), 7086–7093.CrossrefGoogle Scholar

  • [44] Shalizi, C. and A. Kontorovich (2013). Predictive PAC learning and process decompositions. In C. J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems, pp. 1619– 1627.Google Scholar

  • [45] Steinwart, I. and A. Christmann (2009). Fast learning from non-i.i.d. observations. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta (Eds.), Advances in Neural Information Processing Systems, pp. 1768–1776.Google Scholar

  • [46] Steinwart, I., D. Hush, and C. Scovel (2009). Learning from dependent observations. J. Multivariate Anal. 100(1), 175–194.Google Scholar

  • [47] van de Geer, S. (1995). Exponential inequalities for martingales, with application to maximum likelihood estimation for counting processes. Ann. Statist. 23(5), 1779–1801.CrossrefGoogle Scholar

  • [48] Vapnik, V. N. (1998). Statistical Learning Theory. John Wiley & Sons, New York.Google Scholar

  • [49] Wintenberger, O. (2010). Deviation inequalities for sums of weakly dependent time series. Electron. Commun. Probab. 15, 489–503.Web of ScienceGoogle Scholar

  • [50] Wintenberger, O. (2017). Exponential inequalities for unbounded functions of geometrically ergodic Markov chains: applications to quantitative error bounds for regenerative Metropolis algorithms. Statistics 51(1), 222–234.CrossrefWeb of ScienceGoogle Scholar

About the article

Received: 2019-01-18

Accepted: 2019-05-04

Published Online: 2019-06-03

Published in Print: 2019-01-01

Citation Information: Dependence Modeling, Volume 7, Issue 1, Pages 150–168, ISSN (Online) 2300-2298, DOI: https://doi.org/10.1515/demo-2019-0007.

Export Citation

© 2019 Pierre Alquier et al., published by De Gruyter. This work is licensed under the Creative Commons Attribution 4.0 Public License. BY 4.0

Comments (0)

Please log in or register to comment.
Log in