Неравенства концентрации для функционалов от цепей Маркова и их приложения к снижению дисперсии MCMC алгоритмов тема диссертации и автореферата по ВАК РФ 00.00.00, кандидат наук Самсонов Сергей Владимирович
- Специальность ВАК РФ00.00.00
- Количество страниц 233
Оглавление диссертации кандидат наук Самсонов Сергей Владимирович
Contents
Introduction
Notations and definitions
Chapter 1. Rosenthal and Bernstein inequalities for linear statistics of Markov
chains
1.1. Introduction
1.2. Literature review
1.3. Contributions
1.4. Results for V-geometrically ergodic Markov chains
1.5. Geometrically ergodic Markov chains with respect to Kantorovich-Wasserstein semi-metric
1.6. Applications
1.7. Proofs
Chapter 2. Variance reduction for dependent sequences with applications to
Stochastic Gradient MCMC
2.1. Introduction and problem statement
2.2. Control variates for dependent observations
2.3. Contributions
2.4. Empirical Spectral Variance Minimization
2.5. Applications to the Markov kernels, geometrically ergodic in the Kantorovich-Wasser-stein distance
2.6. Applications to Langevin-based MCMC algorithms
2.7. Numerical results
2.8. Proofs
Chapter 3. Variance reduction with martingale representations
3.1. Introduction
3.2. Literature review and problem setup
3.3. Contributions
3.4. Langevin-based MCMC
3.5. Martingale representation
3.6. Gaussian noise model
3.7. Numerical experiments
3.8. Proofs
Conclusion
Bibliography
Appendix A. Russian translation of the thesis
Рекомендованный список диссертаций по специальности «Другие cпециальности», 00.00.00 шифр ВАК
Неасимптотический анализ случайных объектов в пространствах высокой размерности и приложения к задачам машинного обучения2022 год, доктор наук Наумов Алексей Александрович
Гарантии обучения и эффективный вывод в задачах структурного предсказания2024 год, кандидат наук Струминский Кирилл Алексеевич
Эффективные подходы на основе данных к задачам стохастического оптимального распределения потоков электроэнергии/Efficient Data-Driven Approaches in Stochastic Optimal Power Flow2025 год, кандидат наук Лукашевич Александр Леонидович
Численные методы решения негладких задач выпуклой оптимизации с функциональными ограничениями / Numerical Methods for Non-Smooth Convex Optimization Problems with Functional Constraints2020 год, кандидат наук Алкуса Мохаммад
Неклассические методы вероятностного и статистического анализа моделей смеси распределений2022 год, доктор наук Панов Владимир Александрович
Введение диссертации (часть автореферата) на тему «Неравенства концентрации для функционалов от цепей Маркова и их приложения к снижению дисперсии MCMC алгоритмов»
Introduction
With the outstanding interest in the development of novel methods in machine learning, there is growing interest in mathematical tools that provide a framework for understanding and evaluating the performance of algorithms when the observable sample size is finite. Various results based on the concentration of measure phenomenon [1, 2] have proved to be the right instrument for obtaining non-asymptotic guarantees for various algorithms in the fields of reinforcement learning [3], optimization [4], learning theory [5], Monte-Carlo and Markov Chain Monte Carlo methods (MCMC, [6, 7]), and many others. Concentration inequalities for functionals of independent random variables or martingales are relatively well understood, as seen in [2, 8, 9]. At the same time, the situation is different when considering concentration inequalities for functions of dependent random variables. While a wealth of results exists for weakly dependent processes with different types of mixing conditions [10, 11], their application even to the natural setting of additive functionals of Markov chains is challenging. In particular, they are either not quantitative or not precise enough in terms of important problem characteristics, such as the variance of the additive functional in the Bernstein type inequalities (see Chapter 1 for the relevant definitions). This drawback is shared by many existing results obtained specifically for functionals of Markov chains [12-15]. However, it is Markovian stochasticity that appears in the vast majority of machine learning algorithms. Markov chains naturally arise in the non-asymptotic analysis of algorithms in the fields of stochastic approximation [16, 17] or reinforcement learning [3, 18].
In Chapter 1 of this thesis, we obtain new counterparts of the classical Rosenthal and Bernstein inequalities for geometrically ergodic Markov chains with explicit dependence on the mixing time of the underlying chains. We consider an additive functional
X-->,n—1
Sn = - (1)
where g is an integrable measurable function and (X¿)1=0 is a Markov chain with a Markov kernel P, which admits n as unique invariant distribution. We obtain concentration inequalities for the additive functional Sn, similar to those presented in [12, 14, 19, 20]. We refine the dependency of the new estimates on the variance of Sn and the mixing time of the underlying chain. Our proof is based on the cumulant expansion techniques outlined in [21] and the Leonov-Shiryaev formula [22] relating moments and cumulants.
In the subsequent parts of the thesis, we apply concentration inequalities to the non-asymptotic analysis of variance reduction techniques [23, 24], and propose new variance reduction methods for sequences of dependent random variables. The primary aim of variance reduction is to reduce the stochastic error in Monte Carlo estimates. Classical contributions to this field, including those by [25] and [26], have extensively explored variance reduction techniques, with a primary focus on modeling based on sequences of independent and identically distributed (i.i.d.) random variables (see e.g. [27]). However, in many scenarios, generating i.i.d. observations is not feasible,
especially in cases of high problem dimension, and statistical inference must rely on dependent observations. These observations often form a Markov chain, as is the case of MCMC algorithms [6]. Furthermore, the application of variance reduction extends to optimization methods and reinforcement learning, see e.g. [28-31], and references therein.
In Chapter 2, we propose a practical approach to variance reduction for additive functionals of dependent random variables. This approach extends the one introduced in [32] and is applicable to a broader class of Markov chains satisfying the ergodicity condition in the first-order Kantorovich-Wasserstein metric, and to sequences of dependent random variables satisfying the covariance stationarity assumption. The proposed method is based on using the control variates together with minimizing the empirical estimate of the respective asymptotic variance. We provide estimates for the rate of decrease in excess asymptotic variance with the growth of the training sample size. The proposed approach has been applied to MCMC estimates based on the Stochastic Gradient Langevin Dynamics (SGLD, [33]).
In Chapter 3, we consider the problem of variance reduction for additive functionals of Markov chains in the setting where the analytical expression for the invariant distribution of the underlying chain is unknown. In such a setting, we suggest a variance reduction approach based on discrete-time martingale representation, which generalizes the control variates using orthogonal polynomials expansion [34]. This approach does not require knowledge of the chain's stationary distribution or its specific structure. We analyze the algorithm under a normal noise model (see Section 3.6), which particularly covers the celebrated Unadjusted Langevin Algorithm [35-37].
Goals and objectives of the study
The goal of the study is to obtain a new analytical tools for studying concentration properties of functionals of Markov chains and to apply them for theoretical analysis of post-processing methods for MCMC estimates, which are based on control variates. To solve this problem, we consider the following steps:
1. Derive upper bound on cumulants of additive functionals of geometrically ergodic Markov chains, tracing explicit dependence on the parameters of the underlying Markov kernel;
2. Use the bound above to get new counterparts of Rosenthal inequality and Bernstein inequality, keeping precise dependence on the variance of Sn from (1) and mixing time of the kernel;
3. Generalize the above versions of Rosenthal inequality for quadratic forms of functions of Markov chains, converging geometrically fast to the invariant distribution in terms of first-order Kantorovich-Wasserstein metric;
4. Develop a method for selecting the control variates to adjust MCMC estimates, based on minimizing a certain estimate of the asymptotic variance. Study the statistical properties of the suggested method;
5. Develop a variance reduction method for additive functionals of Markov chain, which does not require to know analytically the invariant distribution of the underlying chain. Provide bounds on the variance of adjusted estimates compared to the variance of non-adjusted functional in the normal noise model described in Section 3.6.
Scientific novelty of the results
All results submitted for defense are new. New concentration inequalities of the Rosenthal and Bernstein types have been obtained for additive functionals of Markov chains. These inequalities generalize known estimates in the literature. Moreover, this work provides an original extension of Bernstein inequality to Markov kernels under the condition of ergodicity in the general weighted Kantorovich-Wasserstein metric. Additionally, a novel non-asymptotic analysis of the performance of several variance reduction methods for MCMC algorithms has been conducted, resulting in estimates for the rate of decrease in excess asymptotic variance with the growth of the training sample size. Suggested method of constructing control variates based on discrete martingale decomposition is new and can be used in several settings, when classical techniques, in particular, the ones based on Stein operator, are not directly applicable.
Theoretical and practical significance of the results
The presented results have both theoretical and methodological significance. The theoretical findings introduce new concentration inequalities for additive functionals of Markov chains, which may be valuable for studying Markov Chain Monte Carlo (MCMC) methods. From a methodological perspective, new variance reduction techniques for MCMC algorithms are proposed, which can be applied, in particular, in Bayesian statistics.
Methodology and research methods
The work extensively employs the analytical tools of probability theory, particularly the coupling methods and the method of cumulants, in particular, relations between cumulant bounds and concentration inequalities discussed in Chapter 1. The proofs of the main results rely on the theory of Markov chains and concentration inequalities.
Publications based on research results
The main contributions of the thesis have been published in three peer-reviewed journal articles [38-40]. All three articles are included in the Scopus and Web of Science databases.
1. A.Durmus, E. Moulines, A. Naumov, S. Samsonov. Probability and Moment Inequalities for Additive Functionals of Geometrically Ergodic Markov Chains, Journal of Theoretical Probability, 2024. https://doi.org/10.1007/s10959-024-01315-7;
2. D. Belomestny, L. Iosipoi, E. Moulines, A. Naumov, S. Samsonov. Variance reduction for dependent sequences with applications to stochastic gradient MCMC, SIAM/ASA Journal on Uncertainty Quantification, 9(2), 507-535, 2021. https://doi.org/10.1137/19M1301199;
3. D. Belomestny, E., Moulines, S. Samsonov. Variance reduction for additive Junctionals of Markov chains via martingale representations, Statistics and Computing, 32(1), 16, 2022. https://doi.org/10.1007/s11222-021-10073-z
Approbation of work
Main results of the thesis were presented at the following conferences, schools, and seminars:
1. Winter school and conference "New frontiers in high-dimensional probability and statistics 2", Moscow, February 22 — 23, 2019. Talk: "Concentration inequalities for functionals of Markov Chains with applications to variance reduction";
2. Conference "Structural Inference in High-Dimensional Models 2". Pushkin, Saint-Petersburg, 26 — 30 August 2019. Poster: "Variance Reduction for Dependent Sequences via Empirical Variance Minimisation";
3. Research seminar "Structural Learning", Faculty of Computer science, HSE, Moscow, October 15, 2019. Talk: "Variance reduction for dependent sequences with applications to Stochastic Gradient MCMC";
4. HSE-Yandex Autumn School on Generative Models, Moscow, November 26 — 29, 2019. Poster: "Variance reduction for MCMC algorithms";
5. Winter school "Math of Machine Learning 2020", Sochi, Sirius, February 19 — 22, 2020. Poster: "Variance Reduction for Dependent Sequences via Empirical Variance Minimisation";
6. City seminar on probability theory and mathematical statistics, Saint-Petersburg, POMI RAS, October 09, 2020. Talk: "Variance reduction methods for MCMC algorithms";
7. Conference "New Trends in Mathematical Stochastics", 30.08.2021-03.09.2021, talk "Probability and moment inequalities for additive functionals of geometrically ergodic Markov chains";
8. Research seminar "Structural Learning", Faculty of Computer science, HSE, Moscow, February 28, 2023. Talk: "Rosenthal type inequalities for Markov chains and their applications to Linear Stochastic Approximation".
Theses submitted for defense
1. In Chapter 1 we obtain new counterparts of Rosenthal and Bernstein inequalities for additive functionals of ergodic Markov chains that converge to the stationary distribution exponentially fast either in V-total variation norm or in the Kantorovich-Wasserstein semi-metric. The proof method we employ is based on the cumulant expansion techniques and the connections between cumulants and centered moments established through the Leonov-Shiryaev formula.
2. In Chapter 2 we propose an extension of the variance reduction method using control variates for the case of dependent random sequences that satisfy the covariance stationarity assumption. We obtain estimates for the rate of decrease in excess asymptotic variance with the growth of the training sample size. We derive concentration inequalities for quadratic forms of functions of Markov chains satisfying the contraction condition in the Kantorovich-Wasser-stein metric and apply these results to MCMC estimates based on the Stochastic Gradient Langevin Dynamics (SGLD).
3. In Chapter 3 we propose a novel variance reduction approach for additive functionals of Markov chains based on a discrete-time martingale representation. We study the variance reduction achieved by our method in a special setting of the normal noise model, covering the Unadjusted Langevin Algorithm (ULA), and show its gain over the non-adjusted estimates without variance reduction.
Reliability of results
All results of the dissertation are justified by mathematical proofs. The findings of the dissertation were presented at conferences and scientific seminars.
Structure and scope of work
The thesis consists of introduction, notation section, three chapters, conclusion, and bibliography. The thesis is 113 pages long, including 105 pages of the text, 2 tables, and 12 figures. The bibliography is 8 pages long and includes 119 items.
Author's personal contribution
The author's contribution is primary in the results of Chapter 1 and Chapter 3. Presented results of these sections were obtained personally by the author, apart from the result of Theorem 8. The latter one is the result of a joint work of the doctoral candidate and other co-authors of [38]. For completeness, Chapter 2 includes results obtained jointly with co-authors, namely the results of Section 2.4: Algorithm 1 and Theorem 9. They are obtained jointly by the doctoral candidate and other co-authors of [39]. The author's primary contribution to Chapter 2 are the
results on the concentration of quadratic forms for Markov chains under contractive condition in the first-order Kantorovich-Wasserstein metric with applications to Stochastic Gradient Langevin Dynamics (SGLD). These results are presented in Section 2.5 and Section 2.6. Furthermore, the proof idea for Proposition 5 is attributed to A. Naumov.
Похожие диссертационные работы по специальности «Другие cпециальности», 00.00.00 шифр ВАК
Векторизация изображений с помощью глубокого обучения2024 год, кандидат наук Егиазарян Ваге Грайрович
Математические модели для исследования возможностей и совместимости ресурсов производства программного продукта1999 год, кандидат технических наук Плеханова, Валентина Михайловна
Аналитика Больших Текстовых Данных2022 год, кандидат наук Али Ноаман Мухаммад Абоалязид Мухаммад
Методы обработки, декодирования и интерпретации электрофизиологической активности головного мозга для задач диагностики, нейрореабилитации и терапии нейрокогнитивных расстройств2022 год, доктор наук Осадчий Алексей Евгеньевич
Алгоритмы для сетевых приложений и их теоретический анализ2022 год, доктор наук Николенко Сергей Игоревич
Заключение диссертации по теме «Другие cпециальности», Самсонов Сергей Владимирович
Заключение
1. В Главе 1 получены новые аналоги неравенств Розенталя и Бернштейна для аддитивных функционалов от эргодических марковских цепей, которые сходятся к стационарному распределению с экспоненциальной скоростью либо в V-норме полной вариации, либо в полуметрике Канторовича-Васерштейна. Использованный метод доказательства основан на кумулянтном разложении и связи между кумулянтами и центральными моментами, устанавливаемой с помощью формулы Леонова-Ширяева.
2. В Главе 2 предложено обобщение метода снижения дисперсии с использованием контрольных переменных на случай последовательностей зависимых случайных величин, удовлетворяющих условию стационарности ковариаций, произведен анализ избыточной асимптотической дисперсии алгоритма. Получены неравенства концентрации для квадратичных форм от функций от цепей Маркова, удовлетворяющих условию равномерной геометрической эргодичности в метрике Канторовича-Васерштейна Wd,l. Полученные результаты применены к алгоритмам МСМС на основе динамики Ланжевена с использованием стохастических градиентов (БСГЮ).
3. В Главе 3 предложен новый подход к снижению дисперсии для аддитивных функционалов от марковских цепей на основе дискретного мартингального разложения. Для специального случая модели нормального шума, покрывающей неадаптированный алгоритм Ланжевена (ЦЪА), произведен неасиптотический анализ снижения дисперсии, достигаемого предложенным алгоритмом. Теоретический анализ основан на неравенстве Пуанкаре для гауссовских случайных векторов.
Список литературы диссертационного исследования кандидат наук Самсонов Сергей Владимирович, 2024 год
Список литературы
1. M. Ledoux. The Concentration of Measure Phenomenon, volume 89. AMS Surveys and Monographs, 2001.
2. S. Boucheron, G. Lugosi, and P. Massart. Concentration inequalities. Oxford University Press, Oxford, 2013. A nonasymptotic theory of independence, With a foreword by Michel Ledoux.
3. Chi Jin, Zeyuan Allen-Zhu, Sebastien Bubeck, and Michael I Jordan. Is Q-learning provably efficient? In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing System,s, volume 31. Curran Associates, Inc., 2018.
4. Ron Dorfman and Kfir Yehuda Levy. Adapting to mixing time in stochastic optimization with Markovian data. In International Conference on Machine Learning, pages 5429-5446. PMLR, 2022.
5. Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3(Nov):397-422, 2002.
6. G. O. Roberts and J. S. Rosenthal. General state space Markov chains and MCMC algorithms. Probab. Surv., 1:20-71, 2004.
7. Christophe Andrieu, Nando De Freitas, Arnaud Doucet, and Michael I Jordan. An introduction to MCMC for machine learning. Machine learning, 50:5-43, 2003.
8. Bernard Bercu, Bernard Delyon, and Emmanuel Rio. Concentration inequalities for sums and martingales. Springer, 2015.
9. Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Number 47 in Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press.
10. Florence Merlevede, Magda Peligrad, and Emmanuel Rio. A bernstein type inequality and moderate deviations for weakly dependent sequences. Probability Theory and Related Fields, 151(3-4):435-474, 2011.
11. Emmanuel Rio et al. Asymptotic theory of weakly dependent random processes, volume 80. Springer, 2017.
12. Radoslaw Adamczak. A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electronic Journal of Probability, 13:1000-1034, 2008.
13. Blazej Miasojedow. Hoeffding's inequalities for geometrically ergodic markov chains on general state space. Statistics & Probability Letters, 87:115-120, 2014.
14. Radoslaw Adamczak and Witold Bednorz. Exponential concentration inequalities for additive functionals of Markov chains. ESAIM: Probability and Statistics, 19:440-481, 2015.
15. Jianqing Fan, Bai Jiang, and Qiang Sun. Hoeffding's inequality for general markov chains and its applications to statistical learning. The Journal of Machine Learning Research, 22(1):6185-6219, 2021.
16. Alexandros G. Dimakis, Soummya Kar, José M. F. Moura, Michael G. Rabbat, and Anna Scaglione. Gossip algorithms for distributed signal processing. Proceedings of the IEEE, 98(11):1847—1864, 2010.
17. Francis Bach and Eric Moulines. Non-strongly-convex smooth stochastic approximation with convergence rate o(1/n). In C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc., 2013.
18. Jalaj Bhandari, Daniel Russo, and Raghav Singal. A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pages 1691-1692. PMLR, 2018.
19. Stephan JM Clemencon. Moment and probability inequalities for sums of bounded additive functionals of regular Markov chains via the Nummelin splitting technique. Statistics & probability letters, 55(3):227-238, 2001.
20. Michal Lemanczyk. General Bernstein-like inequality for additive functionals of Markov chains. Journal of Theoretical Probability, 34(3):1426-1454, 2021.
21. R. Bentkus and R. Rudzkis. Exponential estimates for the distribution of random variables. Litovsk. Mat. Sb., 20(1):15-30, 216, 1980.
22. V. P. Leonov and A. N. Sirjaev. On a method of semi-invariants. Theor. Probability Appl., 4:319-329, 1959.
23. Reuven Y. Rubinstein and Dirk P. Kroese. Simulation and the Monte Carlo Method, volume 10. John Wiley & Sons, 2016.
24. Emmanuel Gobet. Monte-Carlo Methods and Stochastic Processes. CRC Press, Boca Raton, FL, 2016.
25. Christian P. Robert and George Casella. Monte Carlo Statistical Methods. Springer, New York, 1999.
26. Paul Glasserman. Monte Carlo Methods in Financial Engineering, volume 53. Springer Science & Business Media, 2013.
27. Chris J Oates, Mark Girolami, and Nicolas Chopin. Control functionals for monte carlo integration. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(3):695-718, 2017.
28. Rie Johnson and Tong Zhang. Accelerating Stochastic Gradient Descent Using Predictive Variance Reduction. In Advances in Neural Information Processing Systems, pages 315-323, 2013.
29. Aaron Defazio, Francis Bach, and Simon Lacoste-Julien. SAGA: A Fast Incremental Gradient Method with Support for Non-Strongly Convex Composite Objectives. In Advances in Neural Information Processing Systems, pages 1646-1654, 2014.
30. Niladri S Chatterji, Nicolas Flammarion, Yi-An Ma, Peter L Bartlett, and Michael I Jordan. On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo. Proceedings of Machine Learning Research, 80, 2018.
31. Jack Baker, Paul Fearnhead, Emily B Fox, and Christopher Nemeth. Control variates for stochastic gradient MCMC. Statistics and Computing, 29(3):599-615, 2019.
32. D. Belomestny, L. Iosipoi, E. Moulines, A. Naumov, and S. Samsonov. Variance reduction for Markov chains with application to MCMC. Statistics and Computing, 30(4):973-997, 2020.
33. M. Welling and Y. W. Teh. Bayesian Learning via Stochastic Gradient Langevin Dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 681-688, 2011.
34. Denis Belomestny, Stefan Hafner, and Mikhail Urusov. Variance reduction for discretised diffusions via regression. Journal of Mathematical Analysis and Applications, 458:393-418, 2018.
35. K. L. Mengersen and R. L. Tweedie. Rates of convergence of the Hastings and Metropolis algorithms. The Annals of Statistics, 24(1):101-121, 02 1996.
36. Arnak Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave densities. Journal of the Royal Statistical Society Series B (Statistical Methodology), 79(3):651-676, 2017.
37. A. Durmus and E. Moulines. Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. Ann. Appl. Probab., 27(3):1551-1587, 2017.
38. Alain Durmus, Eric Moulines, Alexey Naumov, and Sergey Samsonov. Probability and moment inequalities for additive functionals of geometrically ergodic Markov chains. Journal of Theoretical Probability, pages 1-50, 2024.
39. Denis Belomestny, Leonid Iosipoi, Eric Moulines, Alexey Naumov, and Sergey Samsonov. Variance reduction for dependent sequences with applications to stochastic gradient MCMC. SIAM/ASA Journal on Uncertainty Quantification, 9(2):507-535, 2021.
40. Denis Belomestny, Eric Moulines, and Sergey Samsonov. Variance reduction for additive functionals of Markov chains via martingale representations. Statistics and Computing, 32(1):16, 2022.
41. R. Douc, E. Moulines, P. Priouret, and P. Soulier. Markov chains. Springer Series in Operations Research and Financial Engineering. Springer, 2018.
42. L. Saulis and V. A. Statulevicius. Limit theorems for large deviations, volume 73 of Mathematics and its Applications (Soviet Series). Kluwer Academic Publishers Group, Dordrecht, 1991. Translated and revised from the 1989 Russian original.
43. Guillaume Lecue and Charles Mitchell. Oracle inequalities for cross-validation type procedures. Electron. J. Stat., 6:1803-1837, 2012.
44. Iosif Pinelis. Optimum Bounds for the Distributions of Martingales in Banach Spaces. The Annals of Probability, 22(4):1679 - 1706, 1994.
45. Haskell P. Rosenthal. On the subspaces of Lp (p > 2) spanned by sequences of independent random variables. Israel J. Math., 8:273-303, 1970.
46. Jerome Dedecker, Sebastien Gouezel, et al. Subgaussian concentration inequalities for geometrically ergodic Markov chains. Electronic Communications in Probability, 20, 2015.
47. Katalin Marton. A measure concentration inequality for contracting Markov chains. Geometric & Functional Analysis GAFA, 6(3):556-571, 1996.
48. Paul-Marie Samson et al. Concentration of measure inequalities for Markov chains and ф-mixing processes. The Annals of Probability, 28(1):416-461, 2000.
49. A. Joulin and Y. Ollivier. Curvature, concentration and error estimates for Markov chain Monte Carlo. The Annals of Probability, 38(6):2418 - 2442, 2010.
50. T. Kato. Perturbation theory for linear operators, volume 132. Springer Science & Business Media, 2013.
51. Daniel Paulin. Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electronic Journal of Probability, 20(none):1 - 32, 2015.
52. J. Fan, B. Jiang, and Q. Sun. Hoeffding's lemma for Markov chains and its applications to statistical learning. arXiv preprint arXiv:1802.00211, 2018.
53. J. Fan, B. Jiang, and Q. Sun. Bernstein's inequality for general Markov chains. arXiv preprint arXiv:1805.10721, 2018.
54. Ioannis Kontoyiannis and Sean P Meyn. Geometric ergodicity and the spectral gap of nonreversible Markov chains. Probability Theory and Related Fields, 154(1-2):327-339, 2012.
55. Ioannis Kontoyiannis, Sean P Meyn, et al. Spectral theory and limit theorems for geometrically ergodic Markov processes. Annals of Applied Probability, 13(1):304-362, 2003.
56. Ioannis Kontoyiannis, Sean Meyn, et al. Large deviations asymptotics and the spectral theory of multiplicatively regular Markov processes. Electronic Journal of Probability, 10:61-123, 2005.
57. S. Varadhan. Large deviations and applications. SIAM, 1984.
58. Patrice Bertail and Stephan Clemengon. Sharp bounds for the tails of functionals of Markov chains. Theory of Probability & Its Applications, 54(3):505-515, 2010.
59. Gabriela Ciolek and Patrice Bertail. New Bernstein and Hoeffding type inequalities for regenerative Markov chains. Latin American journal of probability and mathematical statistics, 16:1-19, 02 2019.
60. Krishna B Athreya and Peter Ney. A new approach to the limit theory of recurrent Markov chains. Transactions of the American Mathematical Society, 245:493-501, 1978.
61. E. Nummelin. A splitting technique for harris recurrent Markov chains. Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 43:309-318, 1978.
62. Paul Doukhan and Sana Louhichi. A new weak dependence condition and applications to moment inequalities. Stochastic Process. Appl., 84(2):313-342, 1999.
63. Paul Doukhan and Michael H Neumann. Probability and moment inequalities for sums of weakly dependent random variables, with applications. Stochastic Processes and their Applications, 117(7):878-903, 2007.
64. Hermann Thorisson. On maximal and distributional coupling. The Annals of Probability, pages 873-876, 1986.
65. M. Hairer, J.C. Mattingly, and M. Scheutzow. Asymptotic coupling and a general form
of Harris' theorem with applications to stochastic delay equations. Probability theory and related fields, 149(1-2):223-259, 2011.
66. M. Hairer, A.M. Stuart, and S.J. Vollmer. Spectral gaps for Metropolis-Hastings algorithms in infinite dimensions. Ann. Appl. Probab., 24:2455-290, 2014.
67. S. Meyn and R. Tweedie. Markov Chains and Stochastic Stability. Cambridge University Press, New York, NY, USA, 2nd edition, 2009.
68. M. Duflo. Random Iterative Models, volume 34 of Springer, Applications of Mathematics : Stochastic Modelling and Applied Probability. 1997.
69. David Ruppert. Efficient estimations from a slowly convergent robbins-monro process. Technical report, Cornell University Operations Research and Industrial Engineering, 1988.
70. Boris T Polyak and Anatoli B Juditsky. Acceleration of stochastic approximation by averaging. SIAM journal on control and optimization, 30(4):838-855, 1992.
71. A. Dieuleveut, A. Durmus, and F. Bach. Bridging the gap between constant step size stochastic gradient descent and Markov chains. The Annals of Statistics, 48(3):1348-1382, 2020.
72. V. Statuljavicius. Limit theorems for random functions. I. Litovsk. Mat. Sb., 10:583-592, 1970.
73. Yurii Nesterov. Introductory Lectures on Convex Optimization: A Basic Course, volume 87 of Springer Science & Business Media. 2003.
74. Senlin Guo, Feng Qi, and Hari Srivastava. Necessary and sufficient conditions for two classes of functions to be logarithmically completely monotonic. Integral Transforms and Special Functions, 18:819-826, 11 2007.
75. Nikolai Sergeevich Bakhvalov. On the optimality of linear methods for operator approximation in convex classes of functions. USSR Computational Mathematics and Mathematical Physics, 11(4):244-249, 1971.
76. G. O. Roberts and R. L. Tweedie. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, 2(4):341-363, 1996.
77. Leah F South, Chris J Oates, Antonietta Mira, and Christopher Drovandi. Regularized zero-variance control variates. Bayesian Analysis, 1(1): 1-24, 2022.
78. D Belomestny, L Iosipoi, and N Zhivotovskiy. Variance reduction via empirical variance minimization: convergence and complexity. arXiv preprint, arXiv:1712.04667, 2017.
79. Zhuo Sun, Chris J Oates, and Francois-Xavier Briol. Meta-learning control variates: Variance reduction with limited data. In Uncertainty in Artificial Intelligence, pages 2047-2057. PMLR, 2023.
80. L F South, T Karvonen, C Nemeth, M Girolami, and C J Oates. Semi-exact control functionals from Sard's method. Biometrika, 109(2):351-367, 09 2021.
81. Leah F South, Marina Riabiz, Onur Teymur, and Chris J Oates. Postprocessing of mcmc. Annual Review of Statistics and Its Application, 9:529-555, 2022.
82. James M. Flegal and Galin L. Jones. Batch means and spectral variance estimators in Markov
chain monte carlo. Ann. Statist., 38(2):1034-1070, 04 2010.
83. Roland Assaraf and Michel Caffarel. Zero-variance principle for Monte Carlo algorithms. Physical review letters, 83(23):4682, 1999.
84. Antonietta Mira, Reza Solgi, and Daniele Imparato. Zero variance Markov chain Monte Carlo for Bayesian estimators. Statistics and Computing, 23(5):653-662, 2013.
85. Chris J. Oates, Jon Cockayne, Francois-Xavier Briol, and Mark Girolami. Convergence rates for a class of estimators based on Stein's method. Bernoulli, 25(2):1141 - 1159, 2019.
86. K. Marton. Bounding d-distance by informational divergence: a method to prove measure concentration. Ann. Probab, 24(2):857-866, 04 1996.
87. H. Djellout, A. Guillin, and L. Wu. Transportation cost-information inequalities and applications to random dynamical systems and diffusions. Ann. Probab., 32(3B):2702-2732, 2004.
88. Dominique Bakry, Ivan Gentil, and Michel Ledoux. Analysis and geometry of Markov diffusion operators, volume 348. Springer Science & Business Media, 2013.
89. Dominique Bakry and Michel Emery. Diffusions hypercontractives. Séminaire de probabilités de Strasbourg, 19:177-206, 1985.
90. Stephan Clemencon, Gabor Lugosi, and Nicolas Vayatis. Ranking and empirical minimization of u-statistics. The Annals of Statistics, 36(2):844-874, 2008.
91. Quentin Duchemin, Yohann De Castro, and Claire Lacour. Concentration inequality for u-statistics of order two for uniformly ergodic markov chains. Bernoulli, 29(2):929-956, 2023.
92. Quentin Duchemin, Yohann De Castro, and Claire Lacour. Three rates of convergence or separation via u-statistics in a dependent framework. Journal of Machine Learning Research, 23(201): 1-59, 2022.
93. Alain Durmus and Eric Moulines. High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm. Bernoulli, 25(4A):2854-2882, 11 2019.
94. Yi-An Ma, Tianqi Chen, and Emily Fox. A Complete Recipe for Stochastic Gradient MCMC. In Advances in Neural Information Processing Systems, pages 2917-2925, 2015.
95. Yee Whye Teh, Alexandre H Thiery, and Sebastian J Vollmer. Consistency and Fluctuations for Stochastic Gradient Langevin Dynamics. The Journal of Machine Learning Research, 17(1):193-225, 2016.
96. Tigran Nagapetyan, Andrew B Duncan, Leonard Hasenclever, Sebastian J Vollmer, Lukasz Szpruch, and Konstantinos Zygalakis. The True Cost of Stochastic Gradient Langevin Dynamics. arXiv preprint, arXiv:1706.02692, 2017.
97. Arnak S. Dalalyan and Avetik G. Karagulyan. User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient. Stoch. Proc. Appl, 129(12):5278-5311, 2019.
98. Nicolas L. Roux, Mark Schmidt, and Francis R. Bach. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets. In Advances in Neural Information Processing Systems 25, pages 2663-2671, 2012.
99. Shane G Henderson. Variance reduction via an approximating Markov process. PhD thesis, Stanford University, 1997.
100. P. Dellaportas and I. Kontoyiannis. Control variates for estimation based on reversible Markov chain Monte Carlo samplers. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(1), 2012.
101. Nicolas Brosse, Alain Durmus, Sean Meyn, Eric Moulines, and Anand Radhakrishnan. Diffusion approximations and control variates for mcmc. arXiv preprint arXiv:1808.01665, 2018.
102. Nial Friel, Antonietta Mira, and Chris. J. Oates. Exploiting Multi-Core Architectures for Reduced-Variance Estimation with Intractable Likelihoods. Bayesian Analysis, 11(1):215-245, 2015.
103. Danilo Rezende and Shakir Mohamed. Variational Inference with Normalizing Flows. In International conference on machine learning, pages 1530-1538. PMLR, 2015.
104. Timothy E Hanson, Adam J Branscum, Wesley O Johnson, et al. Informative g-Priors for Logistic Regression. Bayesian Analysis, 9(3):597-612, 2014.
105. R. Salakhutdinov and A. Mnih. Bayesian Probabilistic Matrix Factorization Using Markov Chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning (ICML-08), pages 880-887, 2008.
106. Tianqi Chen, Emily Fox, and Carlos Guestrin. Stochastic gradient Hamiltonian Monte Carlo. In Proceedings of the 31st International Conference on Machine Learning, pages 1683-1691, 2014.
107. Nicolas Brosse, Alain Durmus, and Eric Moulines. The promises and pitfalls of Stochastic Gradient Langevin Dynamics. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18, pages 8278-8288, 2018.
108. Roland Assaraf and Michel Caffarel. Zero-variance principle for Monte Carlo algorithms. Phys. Rev. Lett, 83(23):4682-4685, 1999.
109. Tarik Ben Zineb and Emmanuel Gobet. Preliminary control variates to improve empirical regression methods. Monte Carlo Methods Appl., 19(4):331-354, 2013.
110. Ivan T Dimov. Monte Carlo methods for applied scientists. World Scientific, 2008.
111. Shane G. Henderson and Burt Simon. Adaptive simulation using perfect control variates. J. Appl. Probab, 41(3):859-876, 09 2004.
112. Gilles Pages and Fabien Panloup. Weighted multilevel Langevin simulation of invariant measures. Ann. Appl. Probab., 28(6):3358-3417, 2018.
113. D. Lamberton and G. Pages. Recursive computation of the invariant distribution of a diffusion. Bernoulli, 8(3):367-405, 2002.
114. J.C. Mattingly, A.M. Stuart, and D.J. Higham. Ergodicity for sdes and approximations: locally lipschitz vector fields and degenerate noise. Stochastic Processes and their Applications, 101(2):185-232, 2002.
115. Stefan Heinrich and Eugene Sindambiwe. Monte carlo complexity of parametric integration.
Journal of Complexity, 15(3):317-341, 1999.
116. Laszlo Gyorfi, Michael Kohler, Adam Krzyzak, and Harro Walk. A distribution-free theory of nonparametric regression. Springer Science & Business Media, 2006.
117. Valentin De Bortoli and Alain Durmus. Convergence of diffusions and their discretizations: from continuous to discrete processes and back. arXiv preprint arXiv:1904-09808, 2019.
118. G. M. Constantine and T. H. Savits. A multivariate faa di bruno formula with applications. Transactions of the American Mathematical Society, 348(2):503-520, 1996.
119. G. M Constantine. Combinatorial Theory and Statistical Design. Wiley, New York, 1987.
Обратите внимание, представленные выше научные тексты размещены для ознакомления и получены посредством распознавания оригинальных текстов диссертаций (OCR). В связи с чем, в них могут содержаться ошибки, связанные с несовершенством алгоритмов распознавания. В PDF файлах диссертаций и авторефератов, которые мы доставляем, подобных ошибок нет.