1

Distributed continuous-time convex
optimization on weight-balanced digraphs
arXiv:1204.0304v2 [math.OC] 21 Dec 2012

Bahman Gharesifard

Jorge Cortés

Abstract
This paper studies the continuous-time distributed optimization of a sum of convex functions over
directed graphs. Contrary to what is known in the consensus literature, where the same dynamics
works for both undirected and directed scenarios, we show that the consensus-based dynamics that
solves the continuous-time distributed optimization problem for undirected graphs fails to converge
when transcribed to the directed setting. This study sets the basis for the design of an alternative
distributed dynamics which we show is guaranteed to converge, on any strongly connected weightbalanced digraph, to the set of minimizers of a sum of convex differentiable functions with globally
Lipschitz gradients. Our technical approach combines notions of invariance and cocoercivity with the
positive definiteness properties of graph matrices to establish the results.

I. I NTRODUCTION
Distributed optimization of a sum of convex functions has applications in a variety of scenarios,
including sensor networks, source localization, and robust estimation, and has been intensively
studied in recent years, see e.g. [1], [2], [3], [4], [5], [6], [11]. Most of these works build on
consensus-based dynamics [7], [8], [9], [10] to design discrete-time algorithms that find the
solution of the optimization problem. A recent exception are the works [12], [13] that deal with
continuous-time strategies on undirected networks. This paper furthers contributes to this body of
work by studying continuous-time algorithms for distributed optimization in directed scenarios.
The unidirectional information flow among agents characteristic of directed networks often
leads to significant technical challenges when establishing convergence and robustness properties
of coordination algorithms. The results of this paper provide one more example in support of
this assertion for the case of continuous-time consensus-based distributed optimization. This is
Bahman Gharesifard and Jorge Cortés are with the Department of Mechanical and Aerospace Engineering, University of
California, San Diego, {bgharesifard,cortes}@ucsd.edu.

November 27, 2024

DRAFT

2

somewhat surprising given that, for consensus, the same dynamics works for both undirected
connected graphs and strongly connected, weight-balanced directed graphs, see e.g., [7], [8].
The contributions of this paper are the following. We first show that the solutions of the
optimization problem of a sum of locally Lipschitz convex functions over a directed graph
(or digraph) correspond to the saddle points of an aggregate objective function that depends
on the graph topology through its Laplacian. This function is convex in its first argument
and linear in the second. Moreover, its gradient is distributed when the graph is undirected.
Our second step is then to study the convergence properties of the saddle-point dynamics and
establish its asymptotic correctness when the original functions are locally Lipschitz (i.e., not
necessarily differentiable) and convex, extending the results available in the literature [13] for
continuously differentiable, strictly convex functions. Next, we consider the optimization problem
over digraphs. We first provide an example of a strongly connected, weight-balanced digraph
where the distributed version of the saddle-point dynamics does not converge. This motivates
us to introduce a generalization of the dynamics that incorporates a design parameter. We show
that, when the original functions are differentiable and convex with globally Lipschitz gradients,
the design parameter can be appropriately chosen so that the resulting dynamics asymptotically
converge to the set of minimizers of the objective function on any strongly connected and weightbalanced digraph. Our technical approach combines notions and tools from set-valued stability
analysis, algebraic graph theory, and convex analysis.
II. P RELIMINARIES
We start with notational conventions. Let R and R≥0 denote the set of reals and nonnegative
reals, respectively. We let || · || denote the Euclidean norm on Rd . We let 1d = (1, . . . , 1)T , 0d =

(0, . . . , 0)T ∈ Rd , and Id denote the identity matrix in Rd×d . For A ∈ Rd1 ×d2 and B ∈ Re1 ×e2 ,

A ⊗ B is their Kronecker product. A function f : X1 × X2 → R, with X1 ⊂ Rd1 , X2 ⊂ Rd2 closed

and convex, is concave-convex if it is concave in its first argument and convex in the second
one. A saddle point (x∗1 , x∗2 ) ∈ X1 × X2 of f satisfies f (x1 , x∗2 ) ≤ f (x∗1 , x∗2 ) ≤ f (x∗1 , x2 ) for all
x1 ∈ X1 and x2 ∈ X2 . A set-valued map f : Rd ⇒ Rd takes elements of Rd to subsets of Rd .
A. Graph theory
We present basic notions from algebraic graph theory [9]. A directed graph, or digraph, is a

pair G = (V, E), where V is the (finite) vertex set and E ⊆ V × V is the edge set. A digraph is
undirected if (v, u) ∈ E anytime (u, v) ∈ E. We refer to an undirected digraph as a graph. A

November 27, 2024

DRAFT

3

path is an ordered sequence of vertices such that any pair of vertices appearing consecutively
is an edge. A digraph is strongly connected if there is a path between any pair of distinct
vertices. For a graph, this notion is referred to as connected. A weighted digraph is a triplet
n×n
G = (V, E, A), where (V, E) is a digraph and A ∈ R≥0
is the adjacency matrix, satisfying

aij > 0 if (vi , vj ) ∈ E and aij = 0, otherwise. The weighted out-degree and in-degree of vi ,
P
P
i ∈ {1, . . . , n}, are respectively, dwout (vi ) = nj=1 aij and dwin (vi ) = nj=1 aji . The weighted out-

degree matrix Dout is diagonal with (Dout )ii = dwout (i), for i ∈ {1, . . . , n}. The Laplacian matrix is

L = Dout − A. Note that L1n = 0. If G is strongly connected, then zero is a simple eigenvalue of

L. G is undirected if L = LT and weight-balanced if dwout (v) = dwin (v), for all v ∈ V. The following

three notions are equivalent: (i) G is weight-balanced, (ii) 1Tn L = 0, and (iii) L + LT is positive

semidefinite, see e.g., [9, Theorem 1.37]. If G is weight-balanced and strongly connected, then

zero is a simple eigenvalue of L + LT . Any undirected graph is weight-balanced.
B. Nonsmooth analysis

We recall some notions from nonsmooth analysis [15]. A function f : Rd → R is locally

Lipschitz at x ∈ Rd if there exists a neighborhood U of x and Cx ∈ R≥0 such that |f (y)−f (z)| ≤

Cx ||y − z||, for y, z ∈ U. f is locally Lipschitz on Rd if it is locally Lipschitz at x for all

x ∈ Rd and globally Lipschitz on Rd if for all y, z ∈ Rd there exists C ∈ R≥0 such that

|f (y) − f (z)| ≤ C||y − z||. Locally Lipschitz functions are differentiable almost everywhere. If

Ωf denotes the set of points where f fails to be differentiable, the generalized gradient of f is
∂f (x) = co{ lim ∇f (xk ) | xk → x, xk ∈
/ Ωf ∪ S},
k→∞

where S is any set of measure zero and co denotes convex hull.
Lemma 2.1: (Continuity of the generalized gradient map): Let f : Rd → R be a locally

Lipschitz function at x ∈ Rd . Then the set-valued map ∂f : Rd ⇒ Rd is upper semicontinuous

and locally bounded at x ∈ Rd and moreover, ∂f (x) is nonempty, compact, and convex.

For f : Rd × Rd → R and z ∈ Rd , we let ∂x f (x, z) denote the generalized gradient of x 7→

f (x, z). Similarly, for x ∈ Rd , we let ∂z f (x, z) denote the generalized gradient of z 7→ f (x, z).

A critical point x ∈ Rd of f satisfies 0 ∈ ∂f (x). A function f : Rd → R is regular at x ∈ R if

for all v ∈ Rd the right directional derivative of f , in the direction of v, exists at x and coincides
with the generalized directional derivative of f at x in the direction of v, see [15] for definitions

of these notions. A convex and locally Lipschitz function at x is regular [15, Proposition 2.3.6].

November 27, 2024

DRAFT

4

Lemma 2.2: (Finite sum of locally Lipschitz functions): Let {f i }ni=1 be locally Lipschitz at x ∈
P
P
Rd . Then ∂( ni=1 f i )(x) ⊆ ni=1 ∂f i (x), and equality holds if f i is regular for i ∈ {1, . . . , n}
P
(here, the summation of sets is the set of points of the form ni=1 gi , with gi ∈ ∂f i (x)).
A locally Lipschitz and convex function f satisfies, for all x, x′ ∈ Rd and ξ ∈ ∂f (x), the

first-order condition of convexity,

f (x′ ) − f (x) ≥ ξ T (x′ − x).

(1)

The notion of cocoercivity [16] plays a key role in our technical approach later. For δ ∈ R>0 , a
locally Lipschitz function f is δ-cocoercive if, for all x, x′ ∈ Rd and gx ∈ ∂f (x), gx′ ∈ ∂f (x′ ),
(x − x′ )T (gx − gx′ ) ≥ δ(gx − gx′ )T (gx − gx′ ).
The next result [16, Lemma 6.7] characterizes cocoercive differentiable convex functions.
Proposition 2.3: (Characterization of cocoercivity): Let f be a differentiable convex function.
Then, ∇f is globally Lipschitz with constant K ∈ R>0 iff f is K1 -cocoercive.
C. Set-valued dynamical systems
Here, we recall some background on set-valued dynamical systems following [17]. A continuoustime set-valued dynamical system on X ⊂ Rd is a differential inclusion
ẋ(t) ∈ Ψ(x(t))

(2)

where t ∈ R≥0 and Ψ : X ⊂ Rd ⇒ Rd is a set-valued map. A solution to this dynamical system
is an absolutely continuous curve x : [0, T ] → X which satisfies (2) almost everywhere. The set
of equilibria of (2) is denoted by Eq(Ψ) = {x ∈ X | 0 ∈ Ψ(x)}.

Lemma 2.4: (Existence of solutions): For Ψ : Rd ⇒ Rd upper semicontinuous with nonempty,

compact, and convex values, there exists a solution to (2) from any initial condition.
The LaSalle Invariance Principle is helpful to establish the asymptotic convergence of systems
of the form (2). A set W ⊂ X is weakly positively invariant under (2) if, for each x ∈ W , there

exists at least one solution of (2) starting from x entirely contained in W . Similarly, W is

strongly positively invariant under (2) if, for each x ∈ W , all solutions of (2) starting from x

are entirely contained in W . Finally, the set-valued Lie derivative of a differentiable function
V : Rd → R with respect to Ψ at x ∈ Rd is LeΨ V (x) = {v T ∇V (x) | v ∈ Ψ(x)}.

Theorem 2.5: (Set-valued LaSalle Invariance Principle): Let W ⊂ X be strongly positively

invariant under (2) and V : X → R a continuously differentiable function. Suppose the evolutions

November 27, 2024

DRAFT

5

of (2) are bounded and max LeΨ V (x) ≤ 0 or LeΨ V (x) = ∅, for all x ∈ W . Let SΨ,V = {x ∈
X | 0 ∈ LeΨ V (x)}. Then any solution x(t), t ∈ R≥0 , starting in W converges to the largest
weakly positively invariant set M contained in S̄Ψ,V ∩ W . When M is a finite collection of

points, then the limit of each solution equals one of them.

III. P ROBLEM STATEMENT AND EQUIVALENT FORMULATIONS
Consider a network composed by n agents v1 , . . . , vn whose communication topology is
described by a strongly connected digraph G. An edge (vi , vj ) represents the fact that vi can

receive information from vj . For each i ∈ {1, . . . , n}, let f i : Rd → R be locally Lipschitz and

convex, and only available to agent vi . The network objective is to solve
minimize

f (x) =

n
X

f i (x),

(3)

i=1

i

d

in a distributed way. Let x ∈ R denote the estimate of agent vi about the value of the solution
to (3) and let xT = ((x1 )T , . . . , (xn )T ) ∈ Rnd . Next, we provide an alternative formulation of (3).

Lemma 3.1: Let L ∈ Rn×n be the Laplacian of G and define L = L ⊗ Id ∈ Rnd×nd . The

problem (3) on Rd is equivalent to the following problem on Rnd ,
minimize

f˜(x) =

n
X

f i (xi ),

subject to Lx = 0nd .

(4)

i=1

Proof: The proof follows by noting that (i) f˜(1n ⊗ x) = f (x) for all x ∈ Rd and (ii) since

G is strongly connected, Lx = 0nd if and only if x = 1n ⊗ x, for some x ∈ Rd .

The formulation (4) is appealing because it brings together the estimates of each agent about
the value of the solution to the original optimization problem. Note that f˜ is locally Lipschitz
and convex. Moreover, from Lemma 2.2, the elements of its generalized gradient are of the form
g̃x = (g 11 , . . . , g nn ) ∈ ∂ f˜(x), where g i i ∈ ∂f i (xi ), for i ∈ {1, . . . , n}. Since f˜ is convex and
x

x

x

the constraints in (4) are linear, the constrained optimization problem is feasible [18].

The next result introduces a function which corresponds to the Lagrangian function associated
to the constrained optimization problem (4) plus an additional quadratic term that vanishes if
the agreement constraint is satisfied. Interestingly, the saddle points of this function correspond
to the solutions of the constrained optimization problem, as we show next.
Proposition 3.2: (Solutions of the distributed optimization problem as saddle points): Let G be

strongly connected and weight-balanced, and define F : Rnd × Rnd → R by
1
F (x, z) = f˜(x) + xT Lz + xT Lx.
2

November 27, 2024

(5)
DRAFT

6

Then F is locally Lipschitz and convex in its first argument and linear in its second, and
(i) if (x∗ , z ∗ ) is a saddle point of F , then so is (x∗ , z ∗ + 1n ⊗ a), for any a ∈ Rd .

(ii) if (x∗ , z ∗ ) is a saddle point of F , then x∗ is a solution of (4).

(iii) if x∗ is a solution of (4), then there exists z ∗ with Lz ∗ ∈ −∂ f˜(x∗ ) such that (x∗ , z ∗ ) is
a saddle point of F .

Proof: First, note that for G weight-balanced, L + LT is positive semi-definite. Since the

sum of convex functions is convex, one deduces that F is convex in its first argument. By

inspection, F is linear in its second argument. The statement (i) is immediate. To show (ii),
using that G is strongly connected, one can see that the saddle points of F are of the form
(x∗ , z ∗ ) with x∗ = 1n ⊗ x∗ , x∗ ∈ Rd , and Lz ∗ ∈ −∂ f˜(x∗ ). The last inclusion implies that there
exist gxi ∗ ∈ ∂f i (x∗ ), i ∈ {1, . . . , n}, such that Lz ∗ = −(gx1∗ , . . . , gxn∗ )T . Noting that
(1Tn ⊗ Id )L = (1Tn ⊗ Id )(L ⊗ Id ) = 1Tn L ⊗ Id = 0d×dn ,
Pn

∗
i
i=1 gx∗ . As a result, using Lemma 2.2, x is a solution
of (4). Finally, (iii) follows by noting x∗ = 1n ⊗ x∗ and the fact that 0 ∈ ∂f (x∗ ) implies that

we deduce 0d = (1Tn ⊗ Id )Lz ∗ = −

there exists z ∗ ∈ Rnd with Lz ∗ ∈ −∂ f˜(x∗ ), yielding that (x∗ , z ∗ ) is a saddle point of F .

IV. C ONTINUOUS - TIME DISTRIBUTED OPTIMIZATION ON UNDIRECTED NETWORKS
Here, we review the continuous-time solution to the optimization problem proposed in [12],
[13] for undirected graphs. If G is undirected, the gradient of F in (5) is distributed over G.

Given Proposition 3.2, it is natural to consider the saddle-point dynamics of F to solve (3),
ẋ + Lx + Lz ∈ −∂ f˜(x),
ż = Lx.

(6a)
(6b)

Note that (6) is a set-valued dynamical system. Using Lemmas 2.1 and 2.4, one can guarantee
the existence of solutions. Moreover, from Proposition 3.2, if (x∗ , z ∗ ) is an equilibrium of (6),
then x∗ is a solution to (4). According to [13], the dynamics (6) leads the network to agree on
a global minimum of f for the case when G is undirected and f is both strictly convex and

the sum of differentiable convex functions. We extend here this result to the case when G is
undirected and f is the sum of locally Lipschitz convex functions. The proof is also useful later

to illustrate the challenges in solving the distributed optimization problem over directed graphs.
Theorem 4.1: (Asymptotic convergence of (6) on graphs): Let G be a connected graph and

consider the optimization problem (3), where each f i , i ∈ {1, . . . , n} is locally Lipschitz and
November 27, 2024

DRAFT

7

convex. Then, the projection onto the first component of any trajectory of (6) asymptotically
converges to the set of solutions to (4). Moreover, if f has a finite number of critical points, the
limit of the projection onto the first component of each trajectory is a solution of (4).
Proof: For convenience, we denote the dynamics (6) by Ψdis-opt : Rnd × Rnd ⇒ Rnd × Rnd .

Let x∗ = 1n ⊗x∗ be a solution of (4). By Proposition 3.2(iii), there exists z ∗ such that (x∗ , z ∗ ) ∈
Eq(Ψdis-opt ). First, note that given any initial condition (x0 , z0 ) ∈ Rnd × Rnd , the set
Wz0 = {(x, z) | (1Tn ⊗ Id )z = (1Tn ⊗ Id)z0 }

(7)

is strongly positively invariant under (6). Consider then the function V : Rnd × Rnd → R≥0 ,
1
1
V (x, z) = (x − x∗ )T (x − x∗ ) + (z − z ∗ )T (z − z ∗ ).
2
2

(8)

The function V is smooth. Let us examine its set-valued Lie derivative. For each ξ ∈ LeΨdis-opt V (x, z),
there exists v = (−Lx − Lz − g̃x , Lx) ∈ Ψdis-opt (x, z), with g̃x ∈ ∂ f˜(x), such that
ξ = v T ∇V (x, z) = −(x − x∗ )T (Lx + Lz + g̃x ) + (z − z ∗ )T Lx.

(9)

Since F is convex in its first argument and Lx + Lz + g̃x ∈ ∂x F (x, z), using the first-order

condition of convexity (1), we deduce (x∗ − x)T (Lx + Lz + g̃x ) ≤ F (x∗ , z) − F (x, z). On the

other hand, the linearity of F in its second argument implies that (z − z ∗ )T Lx = F (x, z) −

F (x, z ∗ ). Therefore, ξ ≤ F (x∗ , z) − F (x∗ , z ∗ ) + F (x∗ , z ∗ ) − F (x, z ∗ ). Since the equilibria

of Ψdis-opt are the saddle points of F , we deduce that ξ ≤ 0. Since ξ is arbitrary, we conclude
max LeΨ V (x, z) ≤ 0. As a by-product, the trajectories of (6) are bounded. Consequently,
dis-opt

all assumptions of the set-valued version of the LaSalle Invariance Principle, cf. Theorem 2.5,

are satisfied. This result then implies that any trajectory of (6) starting from an initial condition
(x0 , z0 ) converges to the largest weakly positively invariant set M in SΨdis-opt ,V ∩ Wz0 . Our final

step consists of characterizing M. Let (x, z) ∈ M. Then F (x∗, z ∗ ) − F (x, z ∗ ) = 0, i.e.,

1
(10)
f˜(x∗ ) − f˜(x) − (z ∗ )T Lx − xT Lx = 0.
2
Define now G : Rnd × Rnd → R by G(x, z) = f˜(x) + z T Lx. Note that G is convex in

its first argument and linear in its second, and that it has the same saddle points as F . As a
result, G(x∗ , z ∗ ) − G(x, z ∗ ) ≤ 0, or equivalently, f˜(x∗ ) − f˜(x) − (z ∗ )T Lx ≤ 0. Combining

this with (10), we have Lx = 0 and −f˜(x) + f˜(x∗ ) = 0, i.e., x is solution to (4). Since M

is weakly positively invariant, there exists at least a solution of (6) starting from (x, z) that
remains in M. This implies that, along the solution, the components of x remain in agreement,

November 27, 2024

DRAFT

8

i.e., x(t) = 1n ⊗ a(t) with a(t) ∈ Rd a solution of (3). Applying 1Tn ⊗ Id on both sides of
P
1n ⊗ ȧ(t) + Lz ∈ −∂ f˜(x(t)), we deduce nȧ(t) ∈ − n ∂f i (a(t)). Lemma A.2 then implies
i=1

that ȧ(t) = 0, i.e., Lz ∈ −∂ f˜(x) and thus (x, z) ∈ Eq(Ψdis-opt ). Finally, if the set of equilibria
is finite, the last statement holds true.

Remark 4.2: (Asymptotic convergence of saddle-point dynamics): The work [20] studies saddlepoint dynamics and guarantees asymptotic convergence to a saddle point when the function’s
Hessian in one argument is positive definite and the function is linear in the other. Such result,
however, cannot be applied to establish Theorem 4.1 because the generality of the hypotheses
on f mean that F might not satisfy these conditions. Instead, our proof shows that a careful
•

study of the invariance properties of the flow yields the desired result.
V. C ONTINUOUS - TIME DISTRIBUTED OPTIMIZATION ON DIRECTED NETWORKS

Here, we consider the optimization problem (3) on digraphs. When G is directed, the gradient

of F defined in (5) is no longer distributed over G because it contains terms that involve LT and

hence requires agents to receive information from its in-neighbors. In fact, the dynamics (6),
which is distributed over G, does no longer correspond to the saddle-point dynamics of F .

Nevertheless, it is natural to study whether (6) enjoys the same convergence properties as in the
undirected setting (as, for instance, is the case in the agreement problem [7], [8]). Surprisingly,
this turns out not to be the case, as shown in Section V-A. This result motivates the introduction
in Section V-B of an alternative provably correct dynamics on weight-balanced directed graphs.
A. Counterexample
Here, we provide an example of a strongly connected, weight-balanced digraph on which (6)
fails to converge. For convenience, we let Sagree = {(1n ⊗ x, 1n ⊗ z) ∈ Rnd × Rnd | x, z ∈ Rd }

denote the set of agreement configurations. Our construction relies on the following result.

Lemma 5.1: (Necessary condition for the convergence of (6) on digraphs): Let G be a strongly

connected digraph and f i = 0, i ∈ {1, . . . , n}. Then Sagree is stable under (6) iff, for any nonzero
√
eigenvalue λ of the Laplacian L, one has 3|Im(λ)| ≤ Re(λ).

−1
Proof: By assumption, the dynamics (6) is linear with matrix ( −1
1 0 ) ⊗ L and has Sagree
√ 
as equilibria. The eigenvalues of the matrix are of the form λ −1
± 23 i , with λ an eigenvalue
2

of L (because the eigenvalues of a Kronecker product are just the product of the eigenvalues
of the corresponding matrices). Since L = L ⊗ Id , each eigenvalue of L is an eigenvalue of L.
√ 
√
± 23 i = 12 (∓ 3Im(λ) − Re(λ)), from which the result follows.
Finally, Re λ −1
2
November 27, 2024

DRAFT

9

It is not difficult to construct examples of convex functions that have zero contribution to the
linearization of (6) around the solution. Therefore, such systems cannot be convergent if they
fail the criterium identified in Lemma 5.1. The next example shows that this criterium can fail
even for strongly connected weight-balanced digraphs.
Example 5.2: Consider the strongly connected, weight-balanced digraph with


0
0.5326 0.1654 0.0004 0.0002


0.0595

0
0.6676
0.0681
0.1230




A = 0.0213 0.0004
0
0.5809 0.3181


0.0248 0.2458

0
0
0.5587


0.5930 0.1394 0.0877 0.1799
0

as adjacency matrix. Note that λ = 0.8833 ± 0.5197i is an eigenvalue of the Laplacian. Since
√
3|Im(λ)| − Re(λ) = 0.0171 > 0, Lemma 5.1 implies that (6) fails to converge.
•
B. Provably correct distributed dynamics on directed graphs
Here, given the result in Section V-A, we introduce an alternative continuous-time distributed
dynamics for strongly connected weight-balanced digraphs. For reasons that will be made clear
later in Remark 5.5, we restrict our attention to the case when the functions f i , i ∈ {1, . . . , n}
are continuously differentiable. Let α ∈ R>0 and consider the dynamics
ẋ + αLx + Lz = −∇f˜(x),
ż = Lx.

(11a)
(11b)

The existence of solutions is guaranteed by Lemmas 2.1 and 2.4. We first show that appropriate
choices of α allow to circumvent the problem raised in Lemma 5.1.
Lemma 5.3: (Sufficient conditions for the convergence of (11) on digraphs with trivial objective

function): Let G be a strongly connected and weight-balanced digraph and f i = 0, i ∈ {1, . . . , n}.
√
If α ≥ 2 2, then Sagree is asymptotically stable under (11).
Proof: When all fi , i ∈ {1, . . . , n}, are identically zero, the dynamics (11) is linear and has

Sagree as equilibria. Consider the coordinate transformation from (x, z) to (x, y) = (x, βx + z),

with β ∈ R>0 to be chosen later. The dynamics can be rewritten as
!
!
!
x
−(α − β)L
−L
ẋ
=A
, where A =
.
ẏ
y
(−β(α − β) + 1)L −βL

November 27, 2024

(12)

DRAFT

10

Consider the candidate Lyapunov function V (x, y) = xT x + y T y. Its Lie derivative is the
quadratic form defined by the matrix
Q = I2nd A + AT I2nd =

−(α − β)(L + LT )

(−β(α − β) + 1)L − LT

−L + (−β(α − β) + 1)LT
−β(L + LT )

!

.

√
Select β now satisfying β 2 − αβ + 2 = 0 (this equation has a real solution if α ≥ 2 2). Then,
!
2
−( β β+2 − β) −1
⊗ (L + LT ).
(13)
Q=
−1
−β
√
−(β 2 +2)± (β 2 +2)2 −4β 2
Each eigenvalue η of Q is of the form η = λ
, where λ is an eigenvalue of
2β
L + LT . Since G is strongly connected and weight-balanced, L + LT is positive semidefinite with

a simple eigenvalue at zero, and hence η ≤ 0. By the LaSalle invariance principle, the solutions

of (11) from any initial condition (x0 , y0 ) ∈ Rnd × Rnd , asymptotically converge to the set

S = {(x, y) | Q(x, y)T = 02nd } ∩ Wz0 . To conclude the result, we need to show that S ⊆ Sagree .
This follows from noting that, for β > 0, Q(x, y)T = 02nd implies that (L + LT )x = 0nd and
(L + LT )y = 0nd , i.e., (x, y) ∈ Sagree .

The reason behind the introduction of the parameter α in (11) comes from the following

observation: if one tries to reproduce the proof of Theorem 4.1 for a digraph, one encounters
indefinite terms of the form (x − x∗ )T (L − LT )(z − z ∗ ) in the Lie derivative of V , invalidating
it as a Lyapunov function. However, the proof of Lemma 5.3 shows that an appropriate choice
of α, together with a suitable change of coordinates, makes the quadratic form defined by the
identity matrix a valid Lyapunov function. We next build on these observations to establish our
main result: the dynamics (11) solves in a distributed way the optimization problem (3) on
strongly connected weight-balanced digraphs.
Theorem 5.4: (Asymptotic convergence of (11) on weight-balanced digraphs): Let G be a

strongly connected, weight-balanced digraph and consider the optimization problem (3), where
each f i , i ∈ {1, . . . , n}, is convex and differentiable with globally Lipschitz continuous gradient.
Let K ∈ R>0 be the Lipschitz constant of ∇f˜ and define h : R>0 → R by


s
2
4
2
4
2
1
r + 3r + 2
Kr 2
r + 3r + 2
h(r) = Λ∗ (L + LT ) −
+
,
(14)
− 4 +
2
r
r
(1 + r 2 )

where Λ∗ (·) denotes the non-zero eigenvalue with smallest absolute value. Then, there exists
β ∗ ∈ R>0 with h(β ∗ ) = 0 such that, for all 0 < β < β ∗ , the projection onto the first component
2

of any trajectory of (11) with α = β β+2 asymptotically converges to the set of solutions of (4).

November 27, 2024

DRAFT

11

Moreover, if f has a finite number of critical points, the limit of the projection onto the first
component of each trajectory is a solution of (4).
Proof: For convenience, we denote the dynamics (11) by Ψα-dis-opt : Rnd ×Rnd → Rnd ×Rnd .

Note that the equilibria of Ψα-dis-opt are precisely the set of saddle points of F in (5). Let

x∗ = 1n ⊗x∗ be a solution of (4). First, note that given any initial condition (x0 , z0 ) ∈ Rnd ×Rnd ,

the set Wz0 defined by (7) is invariant under the evolutions of (11). By Proposition 3.2(i) and (iii),
there exists (x∗ , z ∗ ) ∈ Eq(Ψα-dis-opt ) ∩ Wz0 . Consider the function V : Rnd × Rnd → R≥0 ,
1
1
V (x, z) = (x − x∗ )T (x − x∗ ) + (y(x,z) − y(x∗ ,z∗ ) )T (y(x,z) − y(x∗ ,z∗ ) ),
2
2

where y(x,z) = βx + z and β ∈ R>0 satisfies β 2 − αβ + 2 = 0. This function is quadratic, hence
smooth. Next, we consider its Lie derivative along Ψα-dis-opt on Wz0 . For (x, z) ∈ Wz0 , let
ξ = LΨα-dis-opt V (x, z) = (−αLx − Lz − ∇f˜(x), Lx)T ∇V (x, z)
 
T
1
∗
T
T
=
(x − x ) , (y(x,z) − y(x∗ ,z∗ ) ) A x, y(x,z) − (x − x∗ )T ∇f˜(x)
2
T
1 T T  T
+
x , y(x,z) A x − x∗ , y(x,z) − y(x∗ ,z∗ ) − β(y(x,z) − y(x∗ ,z∗ ) )T ∇f˜(x),
2

where A is given by (12). This equation can be written as
 
T
1
ξ = (x − x∗ )T , (y(x,z) − y(x∗ ,z∗ ) )T Q x − x∗ , y(x,z) − y(x∗ ,z∗ ) − (x − x∗ )T ∇f˜(x)
2

 
T
+ (x − x∗ )T , (y(x,z) − y(x∗ ,z∗ ) )T A x∗ , y(x∗ ,z∗ ) − β(y(x,z) − y(x∗ ,z∗ ) )T ∇f˜(x),

where Q is given by (13). Note that A(x∗ , y(x∗ ,z∗ ) )T = −(Ly(x∗ ,z∗ ) , βLy(x∗ ,z∗ ) )T = (∇f˜(x∗ ), β∇f˜(x∗ ))T .
Thus, after substituting for y(x,z) , we have
T 
T
1
ξ = (x − x∗ )T , (z − z ∗ )T Q̃ x − x∗ , z − z ∗
2
− (1 + β 2 )(x − x∗ )T (∇f˜(x) − ∇f˜(x∗ )) − β(z − z ∗ )T (∇f˜(x) − ∇f˜(x∗ )),

where
2

Q̃ =

−β 3 − ( β β+2 ) − β −(1 + β 2 )
2

−(1 + β )

−β

!

(15)

⊗ (L + LT ).

Each eigenvalue of Q̃ is of the form
η̃ = λ ×

November 27, 2024

−(β 4 + 3β 2 + 2) ±

p

(β 4 + 3β 2 + 2)2 − 4β 2
,
2β

(16)

DRAFT

12

where λ is an eigenvalue of L + LT . Using the cocoercivity of f˜, we can upper bound ξ as,


T 

x − x∗
x − x∗
Q̃11 Q̃12
0

 

1
∗
∗

 , (17)
 Q̃21 Q̃22

ξ≤ 
z
−
z
z
−
z
−βI
nd

 

2
1
∗
∗
2
˜
˜
˜
˜
∇f (x) − ∇f (x )
∇f (x) − ∇f (x )
0 −βInd − K (1 + β )Ind
{z
}
|
Q

where K ∈ R>0 is the Lipschitz constant for the gradient of f˜.

Since (x, z) ∈ Wz0 , we have (1Tn ⊗ Id )(z − z ∗ ) = 0d and hence it is enough to establish

that Q is negative semidefinite on the subspace W = {(v1 , v2 , v3 ) ∈ (Rnd )3 | (1Tn ⊗ Id )v2 = 0d }.

Using the fact that − K1 (1 + β 2 )Ind is invertible, we can express Q as
Q=N

Q̄

0

0 − K1 (1 + β 2 )Ind

!

Kβ 2
N , Q̄ = Q̃ +
(1 + β 2 )
T

0

0

0 Ind

!


Ind 0

, N =
 0 Ind
0 0

0




βK
I .
1+β 2 nd 

Ind

Noting that W is invariant under N T (i.e., N T W = W), all we need to check is that the matrix
 Q̄

0
is negative semidefinite on W. Clearly, − K1 (1 + β 2 )Ind is negative definite.
1
2
0 − (1+β )I
K

nd

On the other hand, on (Rnd )2 , 0 is an eigenvalue of Q̃ with multiplicity 2d and eigenspace
generated by vectors of the form (1n ⊗ a, 0) and (0, 1n ⊗ b), with a, b ∈ Rd . However, on
{(v1 , v2 ) ∈ (Rnd )2 | (1Tn ⊗ Id )v2 = 0d }, 0 is an eigenvalue of Q̃ with multiplicity d and

eigenspace generated by vectors of the form (1n ⊗a, 0). Moreover, on {(v1 , v2 ) ∈ (Rnd )2 | (1Tn ⊗

Kβ 2
Kβ 2
0 0
Id )v2 = 0d }, the eigenvalues of (1+β
are (1+β
2)
2 ) with multiplicity nd − d and 0 with
0 Ind
multiplicity nd. Therefore, using Weyl’s theorem [21, Theorem 4.3.7], we deduce that the nonzero
2

Kβ
eigenvalues of the sum Q̄ are upper bounded by Λ∗ (Q̃) + (1+β
2 ) . From (16) and the definition

of h in (14), we conclude that the nonzero eigenvalues of Q̄ are upper bounded by h(β). It
remains to show that there exists β ∗ ∈ R>0 with h(β ∗ ) = 0 such that for all 0 < β < β ∗

we have h(β) < 0. For r > 0 small enough, h(r) < 0, since h(r) = − 12 Λ∗ (L + LT )r +
O(r 2). Furthermore, limr→∞ h(r) = K > 0. Hence, the existence of β ∗ follows from the Mean

Value Theorem. Therefore we conclude LΨα-dis-opt V (x, z) ≤ 0. As a by-product, the trajectories

of (11) are bounded. Consequently, all assumptions of the LaSalle Invariance Principle are

satisfied and its application yields that any trajectory of (11) starting from an initial condition
(x0 , z0 ) converges to the largest positively
invariantset M in SΨα-dis-opt ,V ∩ Wz0 . Note that if


(x, z) ∈ SΨα-dis-opt ,V ∩ Wz0 , then N T

x−x∗
z−z ∗
∇f˜(x)−∇f˜(x∗ )

∈ ker(Q̄) × {0}. From the discussion

above, we know ker(Q̄) is generated by vectors of the form (1n ⊗ a, 0), and hence this implies
that x = x∗ + 1n ⊗ a, z = z ∗ , and ∇f˜(x) = ∇f˜(x∗ ), from where we deduce that x is
November 27, 2024

DRAFT

13

also a solution to (4). Finally, for (x, z) ∈ M, an argument similar to the one in the proof of
Theorem 4.1 establishes (x, z) ∈ Eq(Ψα-dis-opt ). If the set of equilibria is finite, convergence to
a point is also guaranteed.

Figure 1 illustrates the result of Theorem 5.4 for the network of Example 5.2.
2

45

5

40
3

35

1

30
1
25
20
−1
15
10

−3

5
−1

5

10

15

20

25

30

−5

(a)

5

10

15

20

25

30

(b)

5

10

15

20

25

30

(c)

Fig. 1. Execution of (11) for the network of Example 5.2 with f 1 (x) = ex , f 2 (x) = (x − 3)2 , f 3 (x) = (x + 3)2 , f 4 (x) = x4 ,
f 5 (x) = 4. (a) and (b) show the evolution of the agent’s values in x and z, respectively, and (c) shows the value of the
Lyapunov function. Here, α = 3, x0 = (1, 2, 0.3, 1, 1)T , and z0 = 15 . The equilibrium (x∗ , z ∗ ) is x∗ = −0.2005 · 15 and
z ∗ = (1.1784, 4.3717, −4.1598, 2.2598, 1.3499)T .

Remark 5.5 (Locally Lipschitz objective functions): Our simulations suggests that the convergence result in Theorem 5.4 holds true for any locally Lipschitz objective function. However, our
proof cannot be reproduced for this case because it would rely on the generalized gradient being
globally Lipschitz which, by Proposition A.1, would imply that the function is differentiable. •

Remark 5.6 (Selection of α in (11)): According to Theorem 5.4, the parameter α is deter2

mined by β as α = β β+2 . In turn, one can observe from (14) that the range of suitable values
for β increases with higher network connectivity and smaller variability of the gradient of the
objective function. From a control design viewpoint, it is reasonable to choose the value of β
that yields the smallest α while satisfying the conditions of the theorem statement.

•

Remark 5.7 (Discrete-time counterpart of (6) and (11)): It is worth noticing that the discretization of (6) for undirected graphs (performed in [12] for the case of continuously differentiable,
strictly convex functions) and (11) for weight-balanced digraphs gives rise to different discretetime optimization algorithms from the ones considered in [1], [2], [3], [4], [5], [6].

•

VI. C ONCLUSIONS AND FUTURE WORK
We have studied the distributed optimization of a sum of convex functions over directed
networks using consensus-based dynamics. Somewhat surprisingly, we have established that the
convergence results established in the literature for undirected networks do not carry over to the
November 27, 2024

DRAFT

14

directed scenario. Nevertheless, our analysis has allowed us to introduce a slight generalization
of the saddle-point dynamics of the undirected case which incorporates a design parameter.
We have proved that, for appropriate parameter choices, this dynamics solves the distributed
optimization problem for differentiable convex functions with globally Lipschitz gradients on
strongly connected and weight-balanced digraphs. Our technical approach relies on a careful
combination of notions from stability analysis, algebraic graph theory, and convex analysis.
Future work will focus on the extension of the convergence results to locally Lipschitz functions
in the weight-balanced directed case and to general digraphs, the incorporation of local and
global constraints, the design of distributed algorithms that allow the network to agree on an
optimal value of the design parameter, the discretization of the algorithms, and the study of the
potential connections with dynamic consensus strategies.
R EFERENCES
[1] M. Rabbat and R. Nowak, “Distributed optimization in sensor networks,” in Symposium on Information Processing of
Sensor Networks, (Berkeley, CA), pp. 20–27, Apr. 2004.
[2] A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic
Control, vol. 54, no. 1, pp. 48–61, 2009.
[3] P. Wan and M. D. Lemmon, “Event-triggered distributed optimization in sensor networks,” in Symposium on Information
Processing of Sensor Networks, (San Francisco, CA), pp. 49–60, 2009.
[4] A. Nedic, A. Ozdaglar, and P. A. Parrilo, “Constrained consensus and optimization in multi-agent networks,” IEEE
Transactions on Automatic Control, vol. 55, no. 4, pp. 922–938, 2010.
[5] B. Johansson, M. Rabi, and M. Johansson, “A randomized incremental subgradient method for distributed optimization in
networked systems,” SIAM Journal on Control and Optimization, vol. 20, no. 3, pp. 1157–1170, 2009.
[6] M. Zhu and S. Martı́nez, “On distributed convex optimization under inequality and equality constraints,” IEEE Transactions
on Automatic Control, vol. 57, no. 1, pp. 151–164, 2012.
[7] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and cooperation in networked multi-agent systems,” Proceedings
of the IEEE, vol. 95, no. 1, pp. 215–233, 2007.
[8] W. Ren and R. W. Beard, Distributed Consensus in Multi-vehicle Cooperative Control. Communications and Control
Engineering, Springer, 2008.
[9] F. Bullo, J. Cortés, and S. Martı́nez, Distributed Control of Robotic Networks. Applied Mathematics Series, Princeton
University Press, 2009. Electronically available at http://coordinationbook.info.
[10] M. Mesbahi and M. Egerstedt, Graph Theoretic Methods in Multiagent Networks. Applied Mathematics Series, Princeton
University Press, 2010.
[11] J. N. Tsitsiklis, D. P. Bertsekas, and M. Athans, “Distributed asynchronous deterministic and stochastic gradient optimization
algorithms,” IEEE Transactions on Automatic Control, vol. 31, no. 9, pp. 803–812, 1986.
[12] J. Wang and N. Elia, “Control approach to distributed optimization,” in Allerton Conf. on Communications, Control and
Computing, (Monticello, IL), pp. 557–561, Oct. 2010.
[13] J. Wang and N. Elia, “A control perspective for centralized and distributed convex optimization,” in IEEE Conf. on Decision
and Control, (Orlando, Florida), pp. 3800–3805, 2011.

November 27, 2024

DRAFT

15

[14] R. T. Rockafellar, Convex Analysis. Princeton Landmarks in Mathematics and Physics, Princeton, NJ: Princeton University
Press, 1997. Reprint of 1970 edition.
[15] F. H. Clarke, Optimization and Nonsmooth Analysis. Canadian Mathematical Society Series of Monographs and Advanced
Texts, Wiley, 1983.
[16] E. G. Golshtein and N. V. Tretyakov, Modified Lagrangians and Monotone Maps in Optimization. New York: Wiley, 1996.
[17] J. Cortés, “Discontinuous dynamical systems - a tutorial on solutions, nonsmooth analysis, and stability,” IEEE Control
Systems Magazine, vol. 28, no. 3, pp. 36–73, 2008.
[18] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.
[19] D. Feijer and F. Paganini, “Stability of primal-dual gradient dynamics and applications to network optimization,”
Automatica, vol. 46, pp. 1974–1981, 2010.
[20] K. Arrow, L. Hurwitz, and H. Uzawa, Studies in Linear and Non-Linear Programming. Stanford, California: Stanford
University Press, 1958.
[21] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University Press, 1985.
[22] A. Bacciotti and F. Ceragioli, “Stability and stabilization of discontinuous systems and nonsmooth Lyapunov functions,”
ESAIM: Control, Optimisation & Calculus of Variations, vol. 4, pp. 361–376, 1999.

A PPENDIX
The next result shows that the differentiability hypothesis of Proposition 2.3 cannot be relaxed.
Proposition A.1 (Lipschitz generalized gradient and differentiability): Any locally Lipschitz
function with globally Lipschitz generalized gradient is differentiable.
Proof: Let f : Rd → R be a locally Lipschitz function and has a globally Lipschitz

generalized gradient map [17]. Take x ∈ Rd and let us show that ∂f (x) is a singleton. Since f

is differentiable almost everywhere, there exists a sequence of points {xn }∞
n=1 , where f is
differentiable such that limn→∞ xn = x. Using the set-valued Lipschitz property of ∂f , we
have ∂f (x) ⊂ ∇f (xn ) + K||xn − x||B(0, 1), where K ∈ R>0 is the Lipschitz constant and

B(0, 1) is the ball centered at 0 ∈ Rd of radius one. Hence, any element v ∈ ∂f (x) can be

written as v = ∇f (xn ) + K||xn − x||un , where un is a unit vector in Rd . Now, taking the

limit, v = limn→∞ ∇f (xn ). Hence the generalized gradient is singleton-valued. Differentiability
follows now from the set-valued Lipschitz condition.

Lemma A.2 (Generalized gradient flow from a critical point): Let f : Rd → R be locally

Lipschitz and convex, and let x∗ be a minimizer of f . Then, the only solution of ẋ(t) ∈ −∂f (x(t))
starting from x∗ is x(t) = x∗ , for all t ≥ 0.

Proof: We reason by contradiction. Assume x(t) is not identically x∗ . Since f is monotoni-

cally nonincreasing along the gradient flow, the trajectory must stay in the set of minimizers of f ,
and hence t 7→ f (x(t)) is constant. Let t′ be the smallest time such that −∂f (x∗ ) ∋ v = ẋ(t′ ) 6= 0.

Using [22, Lemma 1], we have 0 = dtd f (x(t)) = v T ξ, for all ξ ∈ ∂f (x∗ ). In particular, for

ξ = −v, we get 0 = −kvk22 , which is a contradiction.
November 27, 2024

DRAFT

