Методы решения задач, допускающих альтернативную минимизацию / Methods for Solving Problems That Allow Alternating Minimization тема диссертации и автореферата по ВАК РФ 05.13.18, кандидат наук Тупица Назарий Константинович

  • Тупица Назарий Константинович
  • кандидат науккандидат наук
  • 2020, ФГАОУ ВО «Московский физико-технический институт (национальный исследовательский университет)»
  • Специальность ВАК РФ05.13.18
  • Количество страниц 86
Тупица Назарий Константинович. Методы решения задач, допускающих альтернативную минимизацию / Methods for Solving Problems That Allow Alternating Minimization: дис. кандидат наук: 05.13.18 - Математическое моделирование, численные методы и комплексы программ. ФГАОУ ВО «Московский физико-технический институт (национальный исследовательский университет)». 2020. 86 с.

Оглавление диссертации кандидат наук Тупица Назарий Константинович

Table of Contents

Contents Page

Acknowledgments ii

List of Figures iv

List of Tables v

0 Introduction

1 Alternating Minimization

1.1 Optimal Transport

1.2 Wasserstein barycenters

1.3 Multimarginal Optimal Transport

1.3.1 Algorithm Design Based on the Alternating Minimization Approach

1.4 Alternating Least Squares

1.4.1 Matrix Completion

1.4.2 Tensor Decomposition

2 Simple Alternating Minimization Algorithm Analysis

2.1 Notation

2.2 Proximal Polyak-Lojasiewicz condition

2.3 Linear Convergence of Simple Alternating Minimization Algorithm

2.3.1 Example with ||x||A = ^J(Ax,x)

2.3.2 General Convex Constraint Sets

2.4 Polynomial Convergence of Simple Alternating Minimization Algorithm

3 Adaptive Accelerated Gradient Method

3.1 Linear Convergence of AGMsDR

4 Accelerated Alternating Minimization

4.1 AGMsDR Modification for Alternating Minimization

5 Primal Dual Analysis for Multimarginal Optimal Transport

5.1 General Primal-Dual Accelerated Alternating Minimization

5.1.1 Bound for L

5.1.2 Bound for R

5.1.3 Projection on the Feasible Set

5.2 Complexity of Multimarginal OT

6 Numerical Results

6.1 Optimal Transport

6.2 Multimarginal Optimal Transport

6.3 Alternating Least Squares

6.3.1 Matrix Completion

6.3.2 Tensor decomposition

7 Appendix

7.1 Fixed-Step Accelerated Alternating Minimization

7.1.1 Primal-Dual Extension for Fixed Step Accelerated Alternating

Minimization

References

List of Figures

6.1 The results of Algorithm 9 (AAM) and Algorithm 5 (AGM), with different values of e

6.2 The results of Algorithms 9 (AAM) and Algorithm 5 (AGM), with different values of e

6.3 Performance comparison between on the MNIST dataset. Filled in area corresponds to 1 standard deviation

6.4 Performance comparison between multimarginal Sinkhorn's algorithm and Algorithm 8 (n = 15, m = 4)

6.5 Performance comparison between multimarginal Sinkhorn's algorithm and Algorithm 8 (m = 4, e = 0.05)

6.6 Performance of the Algorithm 3 and Algorithm 5 applied to the problem (1.20)

6.7 Performance of the Algorithm 3 and Algorithm 5 applied to the problem (1.21)

List of Tables

4.1 Summary of literature

Рекомендованный список диссертаций по специальности «Математическое моделирование, численные методы и комплексы программ», 05.13.18 шифр ВАК

Введение диссертации (часть автореферата) на тему «Методы решения задач, допускающих альтернативную минимизацию / Methods for Solving Problems That Allow Alternating Minimization»

Introduction

Optimization or mathematical programming goes down to the calculus of variations and the work of Euler and Lagrange.

Optimization can be considered as the fundament of many areas of applied mathematics, engineering, computer science, and a number of other scientific disciplines [77]. But disappointingly for a many of optimization problems, there is impossible to find an analytic solution (impossible to find an explicit-form to an optimal solution). Therefore we only can converge to a solution using numerical methods. There exists many types of these methods, and the special place of them the first-order methods take. One of the well known first order methos - steepest descent method originates to the work of Cauchy in 1847.

The key role in stimulating much of the progress in modern optimization theory and practice in the development of optimization theory played the development of linear programming problem, which originates from the works of Leonid Kantorovich in the 1940s.

Historically, linear programming problems have been solved with simplex-method, which has been becoming more computationally expensive with the increase the problems dimension. The problem turned into so-called large- or even huge-scale optimization problems as well as some other problems arising in machine and deep learning, control, statistics, data science, signal processing, and so on). First-order methods require low iteration cost and low memory storage, and that is why they have received big interest over the recent decades in the smooth and non-smooth, convex and non-convex optimization [13, 70, 86, 90, 70, 19] and many others.

Among them special place takes problems, whose variable vector can be decomposed into a number of blocks in a way that one can find analytically a point which minimizes the gradient calculated w.r.t. to one block. The works related to analysis of such approach contain [38, 17, 45]. In this case we say that the problem allows alternating minimization

(AM). Alternating minimization (AM) optimization algorithms have been known for a long time [80, 18]. These algorithms assume that the decision variable is divided into several blocks and minimization in each block can be done explicitly, i.e. they assume the availability of small-dimensional minimization oracle (SDM-oracle). AM algorithms have a number of applications in machine learning problems. For example, iteratively reweighted least squares can be seen as an AM algorithm. Other applications include robust regression [64] and sparse recovery [27]. Famous Expectation Maximization (EM) algorithm can also be seen as an AM algorithm [65, 8]. Besides mentioned above works on alternating minimization, we also mention a closely related special case of alternating minimization -coordinate descent algorithm. We mention [11, 87, 91] where non-asymptotic convergence rates for AM algorithms for convex problems were proposed and their connection with cyclic coordinate descent was discussed, but the analyzed algorithms are not accelerated. Accelerated coordinate descent (ACD) versions are known for random coordinate descent methods [68, 56, 89, 58, 35, 4, 75, 3], cyclic block coordinate descent [12], greedy coordinate descent [61]. These ACD methods are designed for convex problems and use momentum term, but they require knowledge of block-wise Lipschitz constants, i.e. are not parameter-free. A hybrid accelerated random block-coordinate method with exact minimization in the last block was proposed in [28] for convex problems. An extension providing a two-block accelerated alternating minimization algorithm is available in the updated version [29] for the convex case.

Among others, we are motivated by optimal transport (OT) applications, which are widespread in the machine learning community [25, 26, 9]. The ubiquitous Sinkhorn's algorithm can be seen as an alternating minimization algorithm for the dual to the entropy-regularized optimal transport problem. Recent Greenkhorn algorithm [5], which is a greedy version of Sinkhorn's algorithm, is a greedy modification of an AM algorithm. For the Wasserstein barycenter [2] problem, the extension of the Sinkhorn's algorithm is known as Iterative Bregman Projections (IBP) algorithm [15], which can be seen as an alternating minimization procedure [55]. More general and much less known so called multimarginal optimal transport problem (number of marginal m ^ 3, in contrast with OT m = 2) has been recently shown useful for many applications, like tomographic image reconstruction [1], generative adversarial networks [20], economics [34], and density functional theory [82]. See [83] for a recent survey of fundamental theoretical formulations and applications of the multimarginal optimal transport problem. Computational aspects of the multimarginal problem were studied in [15], where an Iterative Bregman Projections algorithm was proposed for this problem, yet without complexity analysis. It was also pointed that multimarginal optimal transport problem can be applied to calculate the barycenter of m measures without fixing the support of the barycenter. In the preprint [59] the authors propose and analyze complexity of two algorithms for the multimarginal OT problem. The paper follows the entropy regularization approach [25].

We are also strongly motivated to study non-convex problems. The problem of low rank matrix completion [47, 76] and canonical polyadic tensor decomposition [21, 42] can be named there. The alternating least squares (ALS) method is the state-of-the-art method to solve these problems. Both problems are non-convex and allow alternating minimization. Unfortunately, for these problems we can guarantee only convergence to a stationary point, but in practice the better convergence is observed. The main theme of the thesis focuses on the convex, non-convex and strongly-convex optimization problems that allow alternating minimization.1

Beginning with Chapter 1, we bring exact formulation of what the problem that allows alternating minimization is. We also consider several minimization problems that allow alternating minimization, studying in this thesis in a detailed way.

In Chapter 2, we formulate the simplest alternating minimization algorithm in a common way. We consider existing theoretical bound for performance of the algorithm, and also present new analysis.

Chapter 3 is devoted to the existing algorithm called AGMsDR [72]. We present a new convergence regime, appeared not to be published before, for the algorithm.

The next Chapter 4 contains modification of the AGMsDR algorithm, that makes possible to solve problems that allow alternating minimization. Presented new convergence analysis for a strongly convex objective function.

In Chapter 5 we study a multimarginal OT problem in detail and show the following results.

• We developed a novel algorithm for the approximate computation of multimarginal optimal transport maps based on accelerated alternating minimization algorithm.

• We formally proved the computational complexity of the proposed algorithm. Our result indicates an upper exponential bound for the complexity of Wasserstein barycen-ter problem with free support, which is known to be a non-convex optimization problem.

• We showed that under some regimes of the problem parameters m (number of distributions), n (dimension of the distributions), and e (desired accuracy) the proposed algorithm has better iteration complexity in comparison with existing methods.

Finally, in Chapter 6, we consider numerical results by comparison of number of algorithms, including technical details of the implementation of the alternating modification of the AGMsDR algorithm.

The goals and the tasks of the thesis.

1The research is supported by the Ministry of Science and Higher Education of the Russian Federation (Goszadaniye) No. 075-00337-20-03, project No. 0714-2020-0005.

1. Developing convergence analysis of Alternating Minimization for strongly convex objective functions.

2. Generalization of the modification of AGMsDR for objective functions that allow alternating minimization on the class of functions that satisfy Polyak-Lojasiewicz condition (or the class of strongly convex functions).

3. Studying and developing approaches to deal with functions satisfy Polyak-Lojasiewicz condition when the parameter is unknown.

4. Convergence analysis of finding an approximate solution for multimarginal optimal transport problem.

5. Numerical justification of obtained results.

Scientific novelty. The step of gradient descent can be seen as a step to the minimum of local upped bounding proximal function [69]. If proximal function induced by V (x, x0) = 1 ||x — x01|p, we say that we deal with the gradient step in p-norm. In fact, in [32] it was shown that the step of Sinkhorn's algorithm is better than the step of the gradient method in 1 norm, that has improved theoretical bounds. In this thesis we show, that step of AM algorithm is better than the gradient step in any norm. The same is stated about convergence rate of the method. The last statement is holds for two regimes of convergence: polynomial and linear.

New regime of convergence was presented for the existing algorithm (AGMsDR) and its modification for minimizing objectives that allow alternating minimization. As a result, the method (and it's modification) appears to be the first method that achieves lower complexity bounds for convex functions with Lipschitz continuous gradient and at the same time the algorithm possesses a linear convergence rate for strongly convex function in both cases: known and unknown value of the parameter of strong convexity. But in the case the constant is known lower complexity bounds is achieved. The linear convergence also takes place if the objective function that satisfies Polyak-Lojasiewicz condition.

Was generalized alternating modification of AGMsDR algorithm for strongly convex objective function.

We provide a novel algorithm for the computation of approximate solutions to the multimarginal optimal transport problem. We show that the iteration complexity of our algorithm is better than the state-of-the-art methods in a large set of problem regimes with respect to number of distributions, dimension of the distributions and desired accuracy.

The practical and theoretical value of the work in the thesis. The first result is theoretical. For alternating minimization was presented new convergence analysis for

strongly convex functions or for functions that satisfy Polyak-Lojasiewicz condition, moreover AM automatically possesses linear convergence rate in this case, without knowing of the value of parameter in PL condition. The other result is that the method converges better than the gradient method in any norm, so has the best possible constant L and R in the complexity bounds.

The other contribution is concerned with existing method AGMsDR from [72], which is shown to have linear convergence regime for not necessarily convex functions that satisfy Polyak-Lojasiewicz condition (or, less generally, for strongly convex functions). Lower complexity bounds are not archived if the value of the parameter of strong convexity is unknown, the algorithm possesses lower complexity bounds for objectives with Lipschitz continuous gradient, and do not need any adjustments at all. This regime of the algorithm is very important in practice and studied in a number of applications, because , the modern progress in the large-scale optimization is mostly connected with the first order method, since a huge dimension. The regime was proved for modification of AGMsDR method, for the problems, that allow alternating minimization from [40]. And also was provided modification of the last algorithm for solving strongly convex problems, that allow alternating minimization.

The next part is for a novel algorithm for the approximate computation of multi-marginal optimal transport maps based on accelerated alternating minimization algorithm. Computational complexity of the proposed algorithm was proved as well. This problem also has a number application listed above.

And the last part is practical. It contains applications of proposed algorithms to the low rank matrix completion problem and to the canonical polyadic tensor decomposition problem. For the first problem we were able to outperform in time the naive alternating minimization approach.

Statements to be defended. The statements to be defended can be listed as follows:

1. A new convergence analysis was proposed for the Alternating Minimization with Gauss—Southwell block choice rule. The convergence is shown to be better than gradient descent in any norm, e.g. convergence with the best constant L and R.

2. Presented a new regime of convergence for the existing AGMsDR method.

3. Accelerated Alternating Minimization (AAM) algorithm for strongly convex functions.

4. AAM algorithm linear convergence for the functions that satisfy Polyak-Lojasiewicz condition if the parameter in the condition is not used by the algorithm (or for strongly convex functions if the value of the parameter of strong convexity is not used by the algorithm).

5. Multimarginal optimal transport. Primal-dual accelerated alternating minimization was applied. Obtained new bounds and better experimental results than alternating minimization.

Presentations and validation of research results. The results of the thesis were reported and discussed at the following scientific conferences and seminars:

1. N. Tupitsa (MIPT, Russia), A.Gasnikov (MIPT, Russia), P.Dvurechensky (WIAS, Germany) Accelerated Alternating Minimization for Non-convex Problems. Quasilinear Equations, Inverse Problems and their Applications, December 2-4, 2019, Dolgoprudny, Russia. Conference Talk.

2. 19th International Conference on Mathematical Optimization Theory and Operation Research (MOTOR-2020), Novosibirsk, Russia, July 6-10, 2020

Publications. The results in the thesis are represented in the following papers. Published papers:

1. Alexey Kroshnin, Nazarii Tupitsa, Darina Dvinskikh, Pavel E. Dvurechensky, Alexander Gasnikov, and Cesar A. Uribe. On the complexity of approximating wasserstein barycenters. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, pages 3530-3540,2019

2. Nazarii Tupitsa, Alexander Gasnikov, Pavel Dvurechensky, and Sergey Guminov. Strongly convex optimization for the dual formulation of optimal transport. In Mathematical Optimization Theory and Operations Research. MOTOR 2020. Communications in Computer and Information Science, vol 1275., pages 192-204. Springer International Publishing, 2020

3. Daniil Merkulov and Nazarii Tupitsa. On accelerated methods for tensor canonical polyadic decomposition. In Proceedings of MIPT, volume 12, No.4(48), pages 61-71,2020

Accepted papers to the print:

1. Nazarii Tupitsa, Pavel Dvurechensky, Alexander Gasnikov, and César A. Uribe. Multimarginal Optimal Transport by Accelerated Alternating Minimization. In In Proceedings of the 59th IEEE Conference on Decision and Control (CDC)

2. Alexander Gasnikov, Darina Dvinskikh, Pavel Dvurechensky, Dmitry Kamzolov, Vladislav Matykhin, Dmitry Pasechnyk, Nazarii Tupitsa, and Alexei Chernov. Accelerated meta-algorithm for convex optimization. In Computational Mathematics and Mathematical Physics

3. Nazarii Tupitsa, Pavel Dvurechensky, Alexander Gasnikov, and Sergey Guminov. Alternating Minimization Methods for Strongly Convex Optimization. In Journal of Inverse and Ill-posed Problems

Похожие диссертационные работы по специальности «Математическое моделирование, численные методы и комплексы программ», 05.13.18 шифр ВАК

Список литературы диссертационного исследования кандидат наук Тупица Назарий Константинович, 2020 год

References

[1] Isabelle Abraham, Romain Abraham, Maïtine Bergounioux, and Guillaume Carlier. Tomographic reconstruction from a few views: a multi-marginal optimal transport approach. Appl. Math. & Opt., 75(1):55-73, 2017.

[2] Martial Agueh and Guillaume Carlier. Barycenters in the wasserstein space. SIAM Journal on Mathematical Analysis, 43(2):904-924, 2011.

[3] Ahmet Alacaoglu, Quoc Tran Dinh, Olivier Fercoq, and Volkan Cevher. Smooth primal-dual coordinate descent algorithms for nonsmooth convex optimization. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5852-5861. Curran Associates, Inc., 2017.

[4] Zeyuan Allen-Zhu, Zheng Qu, Peter Richtarik, and Yang Yuan. Even faster accelerated coordinate descent using non-uniform sampling. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1110-1119, New York, New York, USA, 20-22 Jun 2016. PMLR. First appeared in arXiv:1512.09103.

[5] Jason Altschuler, Jonathan Weed, and Philippe Rigollet. Near-linear time approx-fimation algorithms for optimal transport via sinkhorn iteration. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 1961-1971. Curran Associates, Inc., 2017. arXiv:1705.09634.

[6] Luigi Ambrosio and Nicola Gigli. A user's guide to optimal transport. In Modelling and optimisation of flows on networks. Springer, 2013.

[7] Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. Tensor decompositions for learning latent variable models. arXiv e-prints, page arXiv:1210.7559, October 2012.

[8] Andreas Andresen and Vladimir Spokoiny. Convergence of an alternating maximization procedure. Journal of Machine Learning Research, 17(63):1-53, 2016.

[9] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein GAN. arXiv:1701.07875, 2017.

[10] Casey Battaglino, Grey Ballard, and Tamara G. Kolda. A practical randomized CP tensor decomposition. SIAM Journal on Matrix Analysis and Applications, 39(2):876-901,2018.

[11] A. Beck. On the convergence of alternating minimization for convex programming with applications to iteratively reweighted least squares and decomposition schemes. SIAM Journal on Optimization, 25(1):185-209, 2015.

[12] A. Beck and L. Tetruashvili. On the convergence of block coordinate descent type methods. SIAM Journal on Optimization, 23(4):2037-2060, 2013.

[13] Amir Beck. First-Order Methods in Optimization. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2017.

[14] Amir. Beck. First-Order Methods in Optimization. Society for Industrial and Applied Mathematics, Philadelphia, PA, 2017.

[15] Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, and Gabriel Peyré. Iterative bregman projections for regularized transportation problems. SIAM Journal on Scientific Computing, 37(2):A1111-A1138, 2015.

[16] Udo Benedikt, Henry Auer, Mike Espig, Wolfgang Hackbusch, and Alexander Auer. Tensor representation techniques in post-hartree-fock methods: Matrix product state tensor format. Molecular Physics, 111:2398-2413, 09 2013.

[17] Dimitri P. Bertsekas and John N. Tsitsiklis. Convergence rate and termination of asynchronous iterative algorithms. In Proceedings of the 3rd International Conference on Supercomputing, ICS '89, page 461-470, New York, NY, USA, 1989. Association for Computing Machinery.

[18] Dimitri P Bertsekas and John N Tsitsiklis. Parallel and distributed computation: numerical methods, volume 23. Prentice hall Englewood Cliffs, NJ, 1989.

[19] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, USA, 2004.

[20] Jiezhang Cao, Langyuan Mo, Yifan Zhang, Kui Jia, Chunhua Shen, and Mingkui Tan. Multi-marginal wasserstein gan. In NeurIPS, pages 1774-1784, 2019.

[21] J. Carroll and Jih-Jie Chang. Analysis of individual differences in multidimensional scaling via an n-way generalization of "Eckart-Young" decomposition. Psychome-trika, 35(3):283-319, September 1970.

[22] Venkatesan T Chakaravarthy, Jee W Choi, Douglas J Joseph, Xing Liu, Prakash Murali, Yogish Sabharwal, and Dheeraj Sreedhar. On Optimizing Distributed Tucker Decomposition for Dense Tensors. arXiv e-prints, page arXiv:1707.05594, July 2017.

[23] Antonin Chambolle, Pauline Tan, and Samuel Vaiter. Accelerated alternating descent methods for dykstra-like problems. Journal of Mathematical Imaging and Vision, 59(3):481-497, Nov 2017.

[24] A. Cichocki, N. Lee, I. V. Oseledets, A. H. Phan, Q. Zhao, and D. Mandic. Low-Rank Tensor Networks for Dimensionality Reduction and Large-Scale Optimization Problems: Perspectives and Challenges PART 1. arXiv e-prints, page arXiv:1609.00893, September 2016.

[25] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26, pages 2292-2300. Curran Associates, Inc., 2013.

[26] Marco Cuturi and Arnaud Doucet. Fast computation of wasserstein barycenters. In Eric P. Xing and Tony Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 685-693, Bejing, China, 22-24 Jun 2014. PMLR.

[27] Ingrid Daubechies, Ronald DeVore, Massimo Fornasier, and C. Sinan Güntürk. Itera-tively reweighted least squares minimization for sparse recovery. Communications on Pure and Applied Mathematics, 63(1):1-38, 2010.

[28] Jelena Diakonikolas and Lorenzo Orecchia. Alternating randomized block coordinate descent. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1224-1232, Stockholmsmassan, Stockholm Sweden, 10-15 Jul 2018. PMLR.

[29] Jelena Diakonikolas and Lorenzo Orecchia. Alternating randomized block coordinate descent. arXiv:1805.09185, 2018.

[30] Pavel Dvurechensky, Darina Dvinskikh, Alexander Gasnikov, César A. Uribe, and Angelia Nedic. Decentralize and randomize: Faster algorithm for Wasserstein barycenters. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, NeurIPS 2018, pages 10783-10793. Curran Associates, Inc., 2018. arXiv:1802.04367.

[31] Pavel Dvurechensky, Alexander Gasnikov, Evgenia Gasnikova, Sergey Matsievsky, Anton Rodomanov, and Inna Usik. Primal-dual method for searching equilibrium in hierarchical congestion population games. In Supplementary Proceedings of the 9th International Conference on Discrete Optimization and Operations Research and Scientific School (DOOR 2016) Vladivostok, Russia, September 19 - 23, 2016, pages 584-595,2016. arXiv:1606.08988.

[32] Pavel Dvurechensky, Alexander Gasnikov, and Alexey Kroshnin. Computational optimal transport: Complexity by accelerated gradient descent is better than by Sinkhorn's algorithm. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1367-1376, 2018. arXiv:1802.04367.

[33] Pavel Dvurechensky, Alexander Gasnikov, Sergey Omelchenko, and Alexander Tiurin. A stable alternative to Sinkhorn's algorithm for regularized optimal transport. In Alexander Kononov, Michael Khachay, Valery A Kalyagin, and Panos Pardalos, editors, Mathematical Optimization Theory and Operations Research, pages 406-423, Cham, 2020. Springer International Publishing.

[34] Ivar Ekeland. An optimal matching problem. ESAIM: Control, Optimisation and Calculus of Variations, 11(1):57-71, 2005.

[35] Olivier Fercoq and Peter Richtárik. Accelerated, parallel, and proximal coordinate descent. SIAM Journal on Optimization, 25(4):1997-2023, 2015. First appeared in arXiv:1312.5799.

[36] Alexander Gasnikov, Darina Dvinskikh, Pavel Dvurechensky, Dmitry Kamzolov, Vladislav Matykhin, Dmitry Pasechnyk, Nazarii Tupitsa, and Alexei Chernov. Accelerated meta-algorithm for convex optimization. In Computational Mathematics and Mathematical Physics.

[37] Lars Grasedyck, Daniel Kressner, and Christine Tobler. A literature survey of low-rank tensor approximation techniques. GAMM-Mitteilungen, 36(1):53-78, 2013.

[38] Luigi Grippof and Marco Sciandrone. Globally convergent block-coordinate techniques for unconstrained optimization. Optimization Methods and Software, 10(4):587-637, 1999.

[39] S. V. Guminov, Yu. E. Nesterov, P. E. Dvurechensky, and A. V. Gasnikov. Accelerated primal-dual gradient descent with linesearch for convex, nonconvex, and nonsmooth optimization problems. Doklady Mathematics, 99(2):125-128, 2019.

[40] Sergey Guminov, Pavel Dvurechensky, Nazarii Tupitsa, and Alexander Gasnikov. Accelerated Alternating Minimization, Accelerated Sinkhorn's Algorithm and Accelerated Iterative Bregman Projections. arXiv e-prints, 2019. arXiv:1906.03622.

[41] N. Hao, Lior Horesh, and M. Kilmer. Nonnegative Tensor Decomposition, pages 123-148. 01 2014.

[42] R. Harshman. Foundations of the parafac procedure: Models and conditions for an "explanatory" multi-model factor analysis. 1970.

[43] Koby Hayashi, Grey Ballard, Jeffrey Jiang, and Michael Tobia. Shared Memory Par-allelization of MTTKRP for Dense Tensors. arXiv e-prints, page arXiv:1708.08976, August 2017.

[44] Edward G. Hohenstein, Robert M. Parrish, and Todd J. Martinez. Tensor hyper-contraction density fitting. i. quartic scaling second- and third-order m0ller-plesset perturbation theory. The Journal of Chemical Physics, 137(4):044103, 2012.

[45] Mingyi Hong, Meisam Razaviyayn, Zhi-Quan Luo, and Jong-Shi Pang. A unified algorithmic framework for block-structured optimization involving big data: With applications in machine learning and signal processing. IEEE Signal Processing Magazine, 33(1):57-77, 2016.

[46] Mingyi Hong, Xiangfeng Wang, Meisam Razaviyayn, and Zhi-Quan Luo. Iteration complexity analysis of block coordinate descent methods. Mathematical Programming, 163(1):85-114, May 2017.

[47] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative filtering for implicit feedback datasets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM '08, page 263-272, USA, 2008. IEEE Computer Society.

[48] T. Huckle, K. Waldherr, and T. Schulte-Herbrueggen. Computations in Quantum Tensor Networks. arXiv e-prints, page arXiv:1212.5005, December 2012.

[49] Felix Hummel, Theodoros Tsatsoulis, and Andreas Gruneis. Low rank factorization of the coulomb integrals for periodic coupled cluster theory. The Journal of Chemical Physics, 146(12):124105, 2017.

[50] Hamed Karimi, Julie Nutini, and Mark Schmidt. Linear convergence of gradient and proximal-gradient methods under the polyak-Lojasiewicz condition, 2016.

[51] Lars Karlsson, Daniel Kressner, and André Uschmajew. Parallel algorithms for tensor completion in the cp format. Parallel Computing, 57:222-234, 2016.

[52] Jonathan A. Kelner, Yin Tat Lee, Lorenzo Orecchia, and Aaron Sidford. An Almost-Linear-Time Algorithm for Approximate Max Flow in Undirected Graphs, and its Multicommodity Generalizations. arXiv e-prints, page arXiv:1304.2338, Apr 2013.

[53] Sami Khenissi and Olfa Nasraoui. Modeling and counteracting exposure bias in recommender systems, Dec 2019.

[54] Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applications. SIAM Review, 51(3):455-500, 2009.

[55] Alexey Kroshnin, Nazarii Tupitsa, Darina Dvinskikh, Pavel E. Dvurechensky, Alexander Gasnikov, and Cesar A. Uribe. On the complexity of approximating wasserstein barycenters. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, pages 3530-3540, 2019.

[56] Yin Tat Lee and Aaron Sidford. Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems. In Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, FOCS '13, pages 147-156, Washington, DC, USA, 2013. IEEE Computer Society. First appeared in arXiv:1305.1922.

[57] Yin Tat Lee and Aaron Sidford. Path finding methods for linear programming: Solving linear programs in O(Vrank) iterations and faster algorithms for maximum flow. FOCS 2014, pages 424-433, 2014.

[58] Qihang Lin, Zhaosong Lu, and Lin Xiao. An accelerated proximal coordinate gradient method. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 3059-3067. Curran Associates, Inc., 2014. First appeared in arXiv:1407.1296.

[59] Tianyi Lin, Nhat Ho, Marco Cuturi, and Michael I. Jordan. On the Complexity of Approximating Multimarginal Optimal Transport. arXiv e-prints, 2019. arXiv:1910.00152.

[60] J. Liu, P. Musialski, P. Wonka, and J. Ye. Tensor completion for estimating missing values in visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):208-220, 2013.

[61] Haihao Lu, Robert Freund, and Vahab Mirrokni. Accelerating greedy coordinate descent methods. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings

of Machine Learning Research, pages 3257-3266, Stockholmsmassan, Stockholm Sweden, 10-15 Jul 2018. PMLR.

[62] Zhi-Quan Luo and Paul Tseng. Error bounds and convergence analysis of feasible descent methods: a general approach. Annals of Operations Research, 46(1):157-178, Mar 1993.

[63] Robert J McCann. A glimpse into the differential topology and geometry of optimal transport. arXiv:1207.1867, 2012.

[64] P. McCullagh and J.A. Nelder. Generalized Linear Models, Second Edition. Chapman and Hall/CRC Monographs on Statistics and Applied Probability Series. Chapman & Hall, 1989.

[65] G. McLachlan and T. Krishnan. The EM Algorithm and Extensions. Wiley Series in Probability and Statistics. Wiley, 1996.

[66] Daniil Merkulov and Nazarii Tupitsa. On accelerated methods for tensor canonical polyadic decomposition. In Proceedings ofMIPT, volume 12, No.4(48), pages 61-71, 2020.

[67] Yurii Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127-152, 2005.

[68] Yurii Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization, 22(2):341-362, 2012. First appeared in 2010 as CORE discussion paper 2010/2.

[69] Yurii Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140(1):125-161, 2013. First appeared in 2007 as CORE discussion paper 2007/76.

[70] Yurii Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Springer Publishing Company, Incorporated, 1 edition, 2014.

[71] Yurii Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Springer Publishing Company, Incorporated, 1 edition, 2014.

[72] Yurii Nesterov, Alexander Gasnikov, Sergey Guminov, and Pavel Dvurechensky. Primal-dual accelerated gradient methods with small-dimensional relaxation oracle. Optimization Methods and Software, pages 1-28, 2020. arXiv:1809.05895.

[73] Yurii Nesterov, Alexander Gasnikov, Sergey Guminov, and Pavel Dvurechensky. Primal-dual accelerated gradient methods with small-dimensional relaxation oracle. Optimization Methods and Software, 2020.

[74] Yurii Nesterov and Arkadii Nemirovskii. Interior-point polynomial algorithms in convex programming. SIAM, 1994.

[75] Yurii Nesterov and Sebastian U. Stich. Efficiency of the accelerated coordinate descent method on structured optimization problems. SIAM Journal on Optimization, 27(1):110-123, 2017. First presented in May 2015 http://www.mathnet.ru: 8080/PresentFiles/11909/7_nesterov.pdf.

[76] Luong Trung Nguyen, Junhan Kim, and Byonghyo Shim. Low-Rank Matrix Completion: A Contemporary Survey. arXiv e-prints, page arXiv:1907.11705, July 2019.

[77] Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer, New York, NY, USA, second edition, 2006.

[78] Julie Nutini, Mark Schmidt, Issam Laradji, Michael Friedlander, and Hoyt Koepke. Coordinate descent converges faster with the gauss-southwell rule than random selection. In International Conference on Machine Learning, pages 1632-1641, 2015.

[79] James M. Ortega and Werner C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2000.

[80] J.M. Ortega and W.C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables. Classics in Applied Mathematics. Society for Industrial and Applied Mathematics, 1970.

[81] Román Orús. A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Annals of Physics, 349:117-158, October 2014.

[82] Robert G Parr. Density functional theory of atoms and molecules. In Horizons of Quantum Chemistry, pages 5-15. Springer, 1980.

[83] Brendan Pass. Multi-marginal optimal transport: theory and applications. Math. Model. & Num. Analys, 49(6):1771-1790, 2015.

[84] Will Pazner and Per-Olof Persson. Approximate tensor-product preconditioners for very high order discontinuous Galerkin methods. Journal of Computational Physics, 354:344-369, February 2018.

[85] Ioakeim Perros, Robert Chen, Richard Vuduc, and Jimeng Sun. Sparse hierarchical tucker factorization and its application to healthcare. In Charu Aggarwal, Zhi-Hua Zhou, Alexander Tuzhilin, Hui Xiong, and Xindong Wu, editors, Proceedings -

15th IEEE International Conference on Data Mining, ICDM 2015, Proceedings -IEEE International Conference on Data Mining, ICDM, pages 943-948, United States, January 2016. Institute of Electrical and Electronics Engineers Inc. 15th IEEE International Conference on Data Mining, ICDM 2015 ; Conference date: 14-11-2015 Through 17-11-2015.

[86] Boris Polyak. Introduction to Optimization. New York, Optimization Software, 1987.

[87] A. Saha and A. Tewari. On the nonasymptotic convergence of cyclic coordinate descent methods. SIAM Journal on Optimization, 23(1):576-601, 2013.

[88] Martin D. Schatz, Tze Meng Low, Robert A. van de Geijn, and Tamara G. Kolda. Exploiting Symmetry in Tensors for High Performance: Multiplication with Symmetric Tensors. arXiv e-prints, page arXiv:1301.7744, January 2013.

[89] Shai Shalev-Shwartz and Tong Zhang. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. In Eric P. Xing and Tony Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 64-72, Bejing, China, 22-24 Jun 2014. PMLR. First appeared in arXiv:1309.2375.

[90] N. Z. Shor, Krzysztof C. Kiwiel, and Andrzej Ruszcayundefinedski. Minimization Methods for Non-Differentiable Functions. Springer-Verlag, Berlin, Heidelberg, 1985.

[91] Ruoyu Sun and Mingyi Hong. Improved iteration complexity bounds of cyclic block coordinate descent for convex problems. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS'15, pages 1306-1314, Cambridge, MA, USA, 2015. MIT Press.

[92] Nazarii Tupitsa. Accelerated Alternating Minimization and Adaptability to Strong Convexity. arXiv e-prints, page arXiv:2006.09097, June 2020.

[93] Nazarii Tupitsa, Pavel Dvurechensky, Alexander Gasnikov, and Sergey Guminov. Alternating Minimization Methods for Strongly Convex Optimization. In Journal of Inverse and Ill-posed Problems.

[94] Nazarii Tupitsa, Pavel Dvurechensky, Alexander Gasnikov, and César A. Uribe. Multimarginal Optimal Transport by Accelerated Alternating Minimization. In In Proceedings of the 59th IEEE Conference on Decision and Control (CDC).

[95] Nazarii Tupitsa, Alexander Gasnikov, Pavel Dvurechensky, and Sergey Guminov. Strongly convex optimization for the dual formulation of optimal transport. In Math-

ematical Optimization Theory and Operations Research. MOTOR 2020. Communications in Computer and Information Science, vol 1275., pages 192-204. Springer International Publishing, 2020.

[96] César A. Uribe, Darina Dvinskikh, Pavel Dvurechensky, Alexander Gasnikov, and Angelia Nedic. Distributed computation of Wasserstein barycenters over networks. In 2018 IEEE 57th Annual Conference on Decision and Control (CDC), 2018. Accepted, arXiv:1803.02933.

[97] C. Villani. Topics in Optimal Transportation. Graduate studies in mathematics. American Mathematical Society, 2003.

Обратите внимание, представленные выше научные тексты размещены для ознакомления и получены посредством распознавания оригинальных текстов диссертаций (OCR). В связи с чем, в них могут содержаться ошибки, связанные с несовершенством алгоритмов распознавания. В PDF файлах диссертаций и авторефератов, которые мы доставляем, подобных ошибок нет.