

Replacing the full gradient by a stochastic oracle in the accelerated algorithms with varying stepsizes we propose is not straightforward we leave this direction for future research. There is a vast literature on distributed optimization to minimize 1 M ∑ m F m + R, with a focus on strategies based on (block-)coordinate or randomized activation, as well as replacing the gradients by cheaper stochastic estimates ( Cevher et al., 2014 Richtárik and Takáč, 2014 Gorbunov et al., 2020 Salim et al., 2020). This generalization is one of our contributions. However, it is not straightforward to adapt lifting to the case of a finite-sum F = 1 M ∑ m F m, with each function F m handled by a different node, which is of primary importance in machine learning. These algorithms are generally designed as sequential ones, for M = 1, and then they can be extended by lifting in product space to parallel versions, well suited to minimize F + R + ∑ m H m ○ K m, see for instance Condat et al., 2019a, Section 8. Proximal splitting algorithms ( Combettes and Pesquet, 2010 Boţ et al., 2014 Parikh and Boyd, 2014 Komodakis and Pesquet, 2015 Beck, 2017 Condat et al., 2019a) are particularly well suited to solve them they consist of simple, easy to compute, steps that can deal with the terms in the objective function separately. Many estimation problems in a wide range of scientific fields can be formulated as large-scale convex optimization problems ( Palomar and Eldar, 2009 Sra et al., 2011 Bach et al., 2012 Bubeck, 2015 Polson et al., 2015 Chambolle and Pock, 2016a Glowinski et al., 2016 Stathopoulos et al., 2016 Condat, 2017a Condat et al., 2019b). We prove O(1/ k 2) convergence rate on the last iterate, see Theorem 3 and Theorem 4, whereas current results in the literature are ergodic, e.g. (b) With a particular strategy of varying stepsizes, we exhibit new algorithms, which are accelerated versions of them. (a) For constant stepsizes, we recover existing algorithms, but we provide new, more precise, results about their convergence speed, see Theorem 1 and Theorem 5. (3) New convergence analysis and acceleration: Even when M = 1, we improve upon the state of the art in two ways: 1), with any M, is reformulated as the minimization of F ̂ + R ̂ + H ̂ ◦ K ̂ in some product space. (b) We design a non-straightforward lifting technique, so that the problem ( Eq. Through this lens, we recover many algorithms as particular cases of this unified framework, like the PD3O, Chambolle–Pock, Loris–Verhoeven algorithms. Hence, the linear operator disappears and the Davis–Yin algorithm ( Davis and Yin, 2017) can be applied to this new problem. the minimization of F + R + H◦ K, can be reformulated as the minimization of F ̃ + R ̃ + H ̃ in a different space, with preserved smoothness and strong convexity properties. (2) Unified framework: The foundation of our distributed algorithms consists in two general principles, applied in a cascade, which are new contributions in themselves and could be used in other contexts: No other more complicated operation, like an inner loop or a linear system to solve, is involved.

1) in whole generality, with proved convergence to an exact solution, and having the full splitting, or decoupling, property: ∇ F m, p r o x H m, K m and K m * are applied at the m-th node, and the proximity operator of R is applied at the master node connected to all others. (1) New algorithms: We propose the first distributed algorithms to solve ( Eq. This template problem covers most convex optimization problems met in signal and image processing, operations research, control, machine learning, and many other fields, and our goal is to propose new generic distributed algorithms able to deal with nonsmooth functions using their proximity operators, with acceleration in presence of strong convexity. With PORTAL, the user just has to specify the linear operator for a fixed regularization.Where M ≥ 1 is typically the number of parallel computing nodes in a distributed setting the K m : X → U m are linear operators X and U m are real Hilbert spaces (all spaces are supposed of finite dimension) R and H m are proper, closed, convex functions with values in R ∪, the proximity operators of which are easy to compute and the F m are convex L F m-smooth functions that is ∇ F m is L F m-Lipschitz continuous, for some L F m > 0. Prototyping algorithms in the primal-dual framework is more difficult than for proximal gradient algorithms but it enables much more flexibility.
