Децентрализованная оптимизация на меняющихся со временем сетях / Decentralized optimization over time-varying networks тема диссертации и автореферата по ВАК РФ 00.00.00, кандидат наук Рогозин Александр Викторович

  • Рогозин Александр Викторович
  • кандидат науккандидат наук
  • 2023, ФГАОУ ВО «Московский физико-технический институт (национальный исследовательский университет)»
  • Специальность ВАК РФ00.00.00
  • Количество страниц 72
Рогозин Александр Викторович. Децентрализованная оптимизация на меняющихся со временем сетях / Decentralized optimization over time-varying networks: дис. кандидат наук: 00.00.00 - Другие cпециальности. ФГАОУ ВО «Московский физико-технический институт (национальный исследовательский университет)». 2023. 72 с.

Оглавление диссертации кандидат наук Рогозин Александр Викторович

Contents

1 Introduction

1.1 Decentralized Communication

1.2 Inexact Oracle Construction

1.3 Consensus Subroutine Technique

1.4 Scientific Novelty

1.5 Presentations and validation of research results

1.6 Publications

1.7 Structure of the thesis

2 Deterministic Decentralized Methods

2.1 Introduction

2.1.1 Related Work

2.1.2 Summary of Contributions

2.2 Inexact Oracle Framework

2.2.1 Preliminaries

2.2.2 Inexact Oracle for f

2.3 Algorithm and Results

2.3.1 Consensus

2.3.2 Complexity of Algorithm

2.4 Numerical Experiments

2.5 Conclusion

3 Stochastic Decentralized Methods

3.1 Introduction

3.1.1 Related Work

3.1.2 Our Contributions

3.2 Preliminaries

3.2.1 Problem Reformulation

3.2.2 Consensus Procedure

3.3 Algorithm and Main Result

3.4 Analysis of the Algorithm

3.4.1 Stochastic Inexact Oracle via Inexact Consensus

3.4.2 Similar Triangles Method with Stochastic Inexact Oracle

3.4.3 Proof of the Main Result

3.5 Numerical Tests

3.6 Conclusion

36

4 Deterministic Proximal Decentralized Method

4.1 Introduction

4.2 Problem Statement

4.2.1 Notation

4.2.2 Objective Functions

4.2.3 Communication Network

4.3 Inexact Oracle Framework

4.4 Accelerated Algorithm and Convergence

4.5 Conclusion

References

5 Appendix for Chapter

5.1 Proof of Theorem

5.1.1 Outer loop

5.1.2 Consensus subroutine iterations

5.1.3 Putting the proof together

6 Appendix for Chapter

6.1 Proof of Lemma

6.2 Proof of Theorem

6.3 Proof of Lemma

6.4 Proof of Lemma

6.5 Putting the proof of Theorem 8 together

List of Figures

2.1 a9a (logistic regression), 100 nodes

2.2 cadata (least squares), 20 nodes

3.1 Random geometric graph with 20 nodes; batch size r =

Рекомендованный список диссертаций по специальности «Другие cпециальности», 00.00.00 шифр ВАК

Введение диссертации (часть автореферата) на тему «Децентрализованная оптимизация на меняющихся со временем сетях / Decentralized optimization over time-varying networks»

Introduction

This chapter provides an overview of the results in the thesis.

1.1 Decentralized Communication

In decentralized optimization, several agents collaboratively solve some optimization problem. Each agent is a computational device, i.e. a computer, smartphone, sensor, drone etc. There is no centralized aggregator or server that can synchronize the agents. The nodes are connected in a decentralized way and can communicate only to their immediate neighbors. Applications of decentralized optimization include vehicle coordination and control [86], distributed statistical inference and machine learning [82, 32, 66], power system control [84, 33], distributed averaging [16, 75, 109], distributed tracking [39], formation control [74, 85, 43, 5], distributed spectrum sensing [8], distributed load balancing [4, 3]. Surveys [69, 62] provide additional examples and [38, 23] give reviews of decentralized optimization over time-static graphs. Also see survey [90] for a review of decentralized methods for time-varying networks.

Decentralized communication has been studied since at least 1980-s [9, 101]. The communication protocol is typically based on a gossip, or peer-to-peer scheme [14]. This thesis covers only synchronized communication rounds. At each communication round, nodes exchange information with their neighbors. Between communications, the nodes perform computational steps. More surveys of communication techniques can be found in [87, 86, 78, 73, 48]. In this thesis we study time-varying graphs, which means that the communication network can change over time. The changes may be caused by some technical malfunctions such as a loss of connection.

Decentralized algorithms for time-varying graphs are built using several techniques. We can roughly point out three techniques. The first one is referred to as gradient tracking [63, 50, 96, 59]. In gradient tracking approach, one introduces an additional variable that approximates the averaged gradient over the graph. We refer to the second technique as ADOM (after the method in which it was originally proposed) and it is presented in the works [53, 51]. ADOM uses a specific reformulation of the optimization problem, tackles decentralized communication as a compression operator and uses an error feedback mechanism. The third technique is consensus subroutine and it is the focus of the dissertation [46, 91, 88, 10]. This approach performs several consensus iterations after each oracle call. It is relatively simple in usage but produces

1

suboptimal methods (up to a logarithmic term). In this work, we study sum-type problems of form

1 m

min f (x) = —V fi(x). (1.1)

xeRd m f—'

i=i

Each node locally holds fi(x) and can perform local computations. The nodes are connected by a graph that can change with time. We assume that a time-varying network is a sequence of graphs {Gk = (V,Ek)}^=Q. In this sequence, the vertex set V does not change with time; instead, only the set of edges Ek depends on the iteration number. With each graph Gk we associate a mixing matrix Wk. Below we address the communication part of the optimization process.

Assumption 1. Mixing matrix sequence {Wk}^=0 satisfies the following properties.

(Decentralized property) [Wk]ij = 0 if (i,j) € Ek and i = j.

(Double stochasticity) Wk 1 = 1 and lTWk = 1T.

(Spectrum property) There exists a positive t € Z and x > 0 such that for any k > t — 1 and any x € Rd it holds

W - — 11T

m

where Wk = Wk ... Wk-T+1.

r

Also for each k introduce Wk = Wk ® I and denote P = 1/m11T ® I. The spectrum property in Assumption 1 ensures geometric convergence of consensus iterates xk+1 = Wkxk to consensual point Px0. Namely, if N is a multiple of t it can be shown that

l|xN — Px°||2 < (l — X)N/T |x0 — Px°||2 .

Taking N = O (x log(1/e)) iterations it holds ||xN — Px0|2 < e.

1.2 Inexact Oracle Construction

Consider function h(x), x € Q that we would like to minimize with an iterative method. It is possible that exact gradient of h is not known. Instead we have an access to approximate function characteristics.

Definition 2. We call (hs (x),^s (y,x)) a (5,L,^) model of function h at point x € Q if for any y € Q it holds

| ||y — x||2 < h(y) — (hs(x) + ^s(y, x)) < L ||y — x||2 + S.

x

Inexact oracle construction described in Definition 2 was proposed in [21, 22]. It arises in different scenarios in optimization: call of first-order oracle at shifted points, approximate values of obtained gradients and function values, Holder continuous functions, objectives obtained by local averaging or approximated by a smooth function. These examples are discussed in detail in [21, 22]. After that, accelerated methods with line search in a general proximal setup were studied in [97].

To illustrate the application of inexact oracle technique for design of decentralized methods, consider a setting when the gradient is computed at shifted points. Let us consider minimization problem with composite term

min h(x) = f(x) + g(x),

xeQ

where f (x) is a convex smooth function, g(x) is a proper convex closed function and Q C Rd is a closed convex set. Assume that f is Lf-smooth and -strongly convex, i.e. for any x,y € Q it holds

^f \\y - x\\2 < f (y) - f (x) - (Vf (x),y - x) < Lf \\y - x\\2 .

For each point y € Q, we do not have an opportunity to compute f (y), g(y) and Vf (y) exactly. Instead, we have an access to a first-order oracle at point x € Q that lies in some neighborhood of y. This is known as "computation as shifted points" and is described in [21]. The inexact model is

hs(y) = f (x) + g(x) + (Vf (x), y - x) - ^ \\x - y\\2 :

^s(z, y) = (Vf (x), z - y) + g(z) - g(x),

where 5 = (Lf + f \\z - ■ It can be shown that

12.

\\z - y\\2 < h(z) - hs(y) - ^s(z, y) < Lf \\z - y\\2 + 5.

That means that (hs(y),^s(z,y)) is a (5, 2Lf /2)-model of h(x).

Let L = (xi = ... = xm} denote the consensus set. Consensus subroutine technique enables to make a sufficient number of communication rounds so that a consensus is reached with a given accuracy. The computations will be performed at the point that lies near to L; therefore, this regime can be interpreted as computation at shifted points.

1.3 Consensus Subroutine Technique

The basic idea of consensus subroutine approach is to perform several communication rounds after each iteration of the method so that the trajectory of the method lies in a neighborhood of consensus set L. Let us illustrate the approach on a non-accelerated gradient descent.

A simple decentralized scheme is a combination of gradient descent for function f defined in (1.1) and a consensus iterations. Introduce

m

F(x) = £ fi(xi).

i=1

Gradient descent iterations write as

xk+1 = xk - YVF (xk).

Consensus iterations write as

xk+1 = Wk xk.

Let the method perform T communications after each gradient step. Combining gradient steps and consensus iterations, we obtain the simplest form of algorithm with consensus subroutine

xk+1 = wtt+t-1 (xk - 7VF(xk)) .

Initially this technique was proposed in [46] for static graphs. In the thesis, this approach is applied to time-varying networks and several classes of problems.

1.4 Scientific Novelty

We underline the three main points of the thesis:

• Near optimal algorithms for decentralized optimization over time-varying networks.

• Dependence on average constants of objective functions instead of worst case constants.

• Accelerated algorithm for decentralized composite optimization. Every point is novel and below we discuss the points in detail.

We propose the first near optimal algorithms for decentralized strongly-convex optimization with deterministic and stochastic primal oracle in the time-varying setup. Prior to our results, the optimal methods were proposed only using the dual oracle [93] and for time-static graphs [46]. Usage of the dual oracle assumes that the objective functions are dual friendly (i.e. admit the computation of Fenchel conjugate and its gradient either in a closed form of via a cheap numerical procedure), which may not always hold. Our approach is based on a consensus subroutine that

allows to obtain a gradient of the objective with a given accuracy. After that, we analyze the methods with inexact gradient computation. Our approach works both for time-static and time-varying graphs.

An interesting and valuable outcome of our approach is the dependence on average constants instead of worst case ones. Let us describe what we mean by average and worst-case. Let each function fi be Li-smooth and ^-strongly convex. By average constants we understand

La = 1/m^m=i Li, ^a = 1/m^m=1 ^ and by worst-case we mean Li = maxi=1.....m Li, ^ =

mini=1)...,m Note that it is possible that La/^a » Ll/^l. Our analysis yields the dependence on the average constants, which may be significantly better in some scenarios.

Finally, we develop an accelerated algorithm for decentralized composite optimization. Each of the objective functions held at the nodes is a sum of a smooth term and a possibly non-smooth composite term. The composite term is assumed to be proximal-friendly, its proximal operator can be easily computed. Our approach allows to work with constrained sets, while previous works [111] only support unconstrained minimization.

1.5 Presentations and validation of research results

The results were presented at the following conferences:

1. Towards accelerated rates for distributed optimization over time-varying networks. XII International Conference on Optimization and Applications. Petrovac, Montenegro, September 27, 2021 (online).

2. An Accelerated Method for Decentralized Distributed Stochastic Optimization Over Time-Varying Graphs. 2021 60th IEEE Conference on Decision and Control (CDC). Austin, USA, December 15, 2021 (online).

3. Decentralized Strongly-Convex Optimization with Affine Constraints: Primal and Dual Approaches. XIII International Conference on Optimization and Applications. Petrovac, Montenegro, September 26, 2022 (online).

1.6 Publications

The results in the thesis are presented in the following papers. Published papers.

1. Alexander Rogozin, Vladislav Lukoshkin, Alexander Gasnikov, Dmitry Kovalev, Egor Shulgin (2021, September). Towards accelerated rates for distributed optimization over time-varying networks. In International Conference on Optimization and Applications (pp. 258-272). Springer, Cham.

2. Alexander Rogozin, Mikhail Bochko, Pavel Dvurechensky, Alexander Gasnikov, Vladislav

Lukoshkin (2021, December). An accelerated method for decentralized distributed stochastic optimization over time-varying graphs. In 2021 60th IEEE Conference on Decision and Control (CDC) (pp. 3367-3373). IEEE.

3. Rogozin, A., Yarmoshik, D., Kopylova, K., and Gasnikov, A. (2023, January). Decentralized Strongly-Convex Optimization with Affine Constraints: Primal and Dual Approaches. In Advances in Optimization and Applications: 13th International Conference, OPTIMA 2022, Petrovac, Montenegro, September 26-30, 2022, Revised Selected Papers (pp. 93-105). Cham: Springer Nature Switzerland.

1.7 Structure of the thesis

The thesis consists of an introduction, three main chapters, list of references and two chapters in the Appendix.

Похожие диссертационные работы по специальности «Другие cпециальности», 00.00.00 шифр ВАК

Обратите внимание, представленные выше научные тексты размещены для ознакомления и получены посредством распознавания оригинальных текстов диссертаций (OCR). В связи с чем, в них могут содержаться ошибки, связанные с несовершенством алгоритмов распознавания. В PDF файлах диссертаций и авторефератов, которые мы доставляем, подобных ошибок нет.