Skip to main content

ORIGINAL RESEARCH article

Front. Phys., 08 October 2024
Sec. Interdisciplinary Physics

When does the mean network capture the topology of a sample of networks?

  • Applied Mathematics, University of Colorado at Boulder, Boulder, CO, United States

The notion of Fréchet mean (also known as “barycenter”) network is the workhorse of most machine learning algorithms that require the estimation of a “location” parameter to analyse network-valued data. In this context, it is critical that the network barycenter inherits the topological structure of the networks in the training dataset. The metric–which measures the proximity between networks–controls the structural properties of the barycenter. This work is significant because it provides for the first time analytical estimates of the sample Fréchet mean for the stochastic blockmodel, which is at the cutting edge of rigorous probabilistic analysis of random networks. We show that the mean network computed with the Hamming distance is unable to capture the topology of the networks in the training sample, whereas the mean network computed using the effective resistance distance recovers the correct partitions and associated edge density. From a practical standpoint, our work informs the choice of metrics in the context where the sample Fréchet mean network is used to characterize the topology of networks for network-valued machine learning.

1 Introduction

There has been recently a flurry of activity around the design of machine learning algorithms that can analyze “network-valued random variables” (e.g. [18], and references therein). A prominent question that is central to many such algorithms is the estimation of the mean of a set of networks. To characterize the mean network we borrow the notion of barycenter from physics, and define the Fréchet mean as the network that minimizes the sum of the squared distances to all the networks in the ensemble. This notion of centrality is well adapted to metric spaces (e.g., [4, 9, 10]), and the Fréchet mean network has become a standard tool for the statistical analysis of network-valued data.

In practice, given a training set of networks, it is important that the topology of the sample Fréchet mean captures the mean topology of the training set. To provide a theoretical answer to this question, we estimate the mean network when the networks are sampled from a stochastic block model. The stochastic block models [11, 12] have great practical importance since they provide tractable models that capture the topology of real networks that exhibit community structure. In addition, the theoretical properties (e.g., degree distribution, eigenvalues distributions, etc.) of this ensemble are well understood. Finally, stochastic block models provide universal approximants to networks and can be used as building blocks to analyse more complex networks [1315].

In this work, we derive the expression of the sample Fréchet mean of a stochastic block model for two very different distances: the Hamming distance [16] and the effective resistance perturbation distance [17]. The Hamming distance, which counts the number of edges that need to be added or subtracted to align two networks defined on the same vertex set, is very sensitive to fine scale fluctuations of the network connectivity. To detect larger scale changes in connectivity, we use the resistance perturbation distance [17]. This network distance can be tuned to quantify configurational changes that occur on a network at different scales: from the local scale formed by the neighbors of each vertex, to the largest scale that quantifies the connections between clusters, or communities [17]. See ([1820], and references therein) for recent surveys on network distances.

Our analysis shows that the sample Fréchet mean network computed with the Hamming distance is unable to capture the topology of networks in the sample. In the case of a sparse stochastic block model, the Fréchet mean network is always the empty network. Conversely, the Fréchet mean computed using the effective resistance distance recovers the underlying network topology associated with the generating process: the Fréchet mean discovers the correct partitions and associated edge densities.

1.1 Relation to existing work

To the best of our knowledge, we are not aware of any theoretical derivation of the sample Fréchet mean for any of the classic ensemble of random networks. Nevertheless, our work share some strong connections with related research questions.

1.1.1 The Fréchet mean network as a location parameter

Several authors have proposed simple models of probability measures defined on spaces of networks, which are parameterized by a location and a scale parameter [5, 21]. These probability measures can be used to assign a likelihood to an observed network by measuring the distance of that network to a central network, which characterizes the location of the distribution. The authors in [5] explore two choices for the distance: the Hamming distance, and a diffusion distance. Our choice of distances is similar to that of [5].

1.1.2 Existing metrics for the Fréchet mean network

The concept of Fréchet mean necessitates a choice of metric (or distance) on the probability space of networks. The metric will influence the characteristics that the mean will inherit from the network ensemble. For instance, if the distance is primarily sensitive to large scale features (e.g., community structure or the existence of highly connected “hubs”), then the mean will capture these large scale features, but may not faithfully reproduce the fine scale connectivity (e.g., the degree of a vertex, or the presence of triangles).

One sometimes needs to compare networks of different sizes; the edit distance, which allows for creation and removal of vertices, provides an elegant solution to this problem. When the networks are defined on the same vertex set, the edit distance becomes the Hamming distance [22], which can also be interpreted as the entrywise 1 norm between the two adjacency matrices. Replacing the 1 norm with the 2 norm yields the Frobenius norm, which has also been used to compare networks (modulo an unknown permutation of the vertices–or equivalently by comparing the respective classes in the quotient set induced by the action of the group of permutations [4, 10]). We note that the computation of the sample Fréchet mean network using the Hamming distance is NP-hard (e.g., [23]). For this reason, several alternatives have been proposed (e.g., [3]). Both the Hamming distance and Frobenius norm are very sensitive to the fine scale edge connectivity. To probe a larger range of scales, one can compute the mean network using the eigenvalues and eigenvectors of the respective network adjacency matrices [14, 24, 25].

1.2 Content of the paper: our main contributions

Our contributions consists of two results.

1.2.1 The network distance is the Hamming distance

We prove that when the probability space is equipped with the Hamming distance, then the sample Fréchet mean network converges in probability to the sample median network (computed using the majority rule), in the limit of large sample size. This result has significant practical consequences. Consider the case where one needs to estimate a “central network” that captures the connectivity structure of a training set of sparse networks. Our work implies that if one uses the Hamming distance, then the sample Fréchet mean will be the empty network.

1.2.2 The network distance is the resistance perturbation distance

We prove that when the probability space is equipped with the resistance perturbation distance, then the adjacency matrix of the sample Fréchet mean converges to the sample mean adjacency matrix with high probability, in the limit of large network size. Our theoretical analysis is based on the stochastic block model [12], a model of random networks that exhibit community structure. In practical applications, our work suggests that one should use the effective resistance distance to learn the mean topology of a sample of networks.

1.3 Outline of the paper

In Section 2, we describe the stochastic block model, the Hamming and resistance distances that are defined on this probability space. The reader who is already familiar with the network models and distances can skip to Section 3 wherein we detail the main results, along with the proofs of the key results. In Section 4, we discuss the implications of our work. The proofs of some technical lemmata are left aside in Section 5.

2 Network ensemble and distances

2.1 The network ensemble

Let G be the set of all simple labeled networks with vertex set [n]= def1,,n, and let S be the set of n×n adjacency matrices of networks in G,

S=A0,1n×n;whereaij=aji,andai,i=0;1i<jn.(1)

Because there is a unique correspondence between a network G=(V,E) and its adjacency matrix A, we sometimes (by an abuse of the language) refer to an adjacency matrix A as a network. Also, without loss of generality we assume throughout the paper that the network size n is even.

We define the matrix P that encodes the edge density within each community and across communities. P can be written as the Kronecker product of the following two matrices,

P=pqqpJn/2(2)

where Jn/2 is the n/2×n/2 matrix with all entries equal to 1. We denote by Gn,p,q, the probability space S equipped with the probability measure,

AS,PA=1jn/21in/2paij1p1aijn/2+1jn1in/2qaij1q1aij.(3)

Gn,p,q is referred to as a two-community stochastic blockmodel [12]. One can interpret the stochastic blockmodel as follows: the nodes of a network GGn,p,q are partitioned into two communities. The first n/2 nodes constitute community C1; the second community, C2, comprises the remaining n/2 nodes. Edges in the graph are drawn from independent Bernoulli random variables with the following probability of success: p for edges within each community, and q for the across-community edges.

2.2 The Hamming distance between networks

Let A and A be the adjacency matrices of two unweighted networks defined on the same vertex set. The Hamming distance [16] is defined as follows.

Definition 1. The Hamming distance between A and A is defined as

dHA,A=12AA1,(4)

where the elementwise 1 norm of a matrix A is given by A1=1i,jn|aij|.

Because the distance dH is not concerned about the locations of the edges that are different between the two graphs, dH(A,A) is oblivious to topological differences between A and A. For instance, if A and A are sampled from Gn,p,q, then the complete removal of the across-community edges induces the same distance as the removal, or addition, of that same number of edges in either community. In other words, a catastrophic change in the network topology cannot be distinguished from benign fluctuations in the local connectivity within either community. To address the limitation of the Hamming distance we define the resistance distance [17].

2.3 The resistance (perturbation) distance between networks

For the sake of completeness, we review the concept of effective resistance (e.g., [26, 27]). Let A denote the adjacency matrix of a network G=(V,E), and let D denote the diagonal degree matrix, dii=j=1naij. We consider the combinatorial Laplacian matrix [28] defined by

L=DA.(5)

We denote by L the Moore-Penrose pseudoinverse of L. Let i,j be two nodes of the network, the effective resistance between i and j is given by

Rij=Lii+Ljj2Lij.(6)

Intuitively, Rij depends on the abundance of paths between i and j. We have the following lower bound that quantifies the burgeoning of connections around the nodes i and j,

1di+1djRij,(7)

where di and dj are the degrees of nodes i and j respectively. As shown in [29], this lower bound is attained for a large class of graphs (see also Remark 3).

The resistance-perturbation distance (or resistance distance for short) is based on comparing the effective resistances matrices R and R of G and G respectively. To simplify the discussion, we only consider networks that are connected with high probability. All the results can be extended to disconnected networks as explained in [17].

Definition 2. Let G=(V,E) and G=(V,E) be two networks defined on the same vertex set [n]. Let R and R denote the effective resistances of G and G respectively. We define the resistance-perturbation distance [17] to be

drpG,G=1i<jnRijRij2.(8)

3 Main results

We first review the concept of sample Fréchet mean, and then present the main results. We consider the probability space (S,P) formed by the adjacency matrices of networks sampled from Gn,p,q. We equip S with a distance d, which is either the Hamming distance or the resistance distance. Let A(k),1kN be adjacency matrices sampled independently from Gn,p,q.

3.1 The sample Fréchet mean

The sample Fréchet function evaluated at BS is defined by

F̂2(B)=1Nk=1Nd2B,A(k).(9)

The minimization of the Fréchet function F̂2(B) gives rise to the concept of sample Fréchet mean [30], or network barycenter [31].

Definition 3. The sample Fréchet mean network is the set of adjacency matrices μ̂P solutions to

μ̂P=argminBS1Nk=1Nd2B,A(k).(10)

Solutions to the minimization problem in Equation 10 always exist, but need not be unique. In Theorem 1 and Theorem 2, we prove that the sample Fréchet mean network of Gn,p,q is unique, when d is either the Hamming distance or the resistance distance.

A word on notations is in order here. It is customary to denote by μP the population Fréchet network of the probability distribution P, (e.g., [31]), since the adjacency matrix μP characterizes the location of the probability distribution P. Because we use hats to denote sample (empirical) estimates, we denote by μ̂P the adjacency matrix of the sample Fréchet mean network.

3.2 The sample Fréchet mean of Gn,p,q computed with the Hamming distance

The following theorem shows that the sample Fréchet mean network converges in probability to the sample Fréchet median network, computed using the majority rule, in the limit of large sample size, N.

Theorem 1. Let μ̂P be the sample Fréchet mean network computed using the Hamming distance. Then,

ε>0,N0,NN0,PdHμ̂P,m̂P<ε1ε.(11)

where m̂P is the adjacency matrix computed using the majority rule,

i,jn,m̂Pij=1ifk=1Naij(k)N/2,0otherwise.(12)

Remark 1. The matrix m̂P is the sample Fréchet median network (e.g., [32], solution to the following minimization problem [21],

m̂P=argminBSF̂1(B),(13)

where F̂1 is the Fréchet function associated to the sample Fréchet median, defined by

F̂1(B)=1Nk=1NdHA(k),B.(14)

Remark 2. The network size n in Theorem 1 is assumed to be constant; the convergence in probability in Theorem 1 happens when the sample size N. The proof of theorem 1 involves constants that are sublogarithmic functions of n (see α and β in the proof of lemma 3 in Section 5.2.)

One could envision a scenario where the network size n would grow with the sample size N. In that case, we need N=ωlogn to ensure that lemma 3 provides a useful bound. This is a very weak upper bound on n, satisfied for instance for n=expNc, with 0<c<1. Finally, theorem 1 holds for any values of the edge densities p and q (whether these depend on n or N), as long as they are always distinct from 1/2 (to avoid the instability that occurs when estimating μ̂P; see lemma 4 for details).

Before deriving the proof of theorem 1, we present an extension of the Hamming distance to weighted networks. We remember that the sample Fréchet mean network computed using the Hamming distance has to be an unweighted network, since the Hamming distance is only defined for unweighted networks. This theoretical observation notwithstanding, the proof of theorem 1 becomes much simpler if we introduce an extension of the Hamming distance to weighted networks; in truth, we extend a slightly different formulation of the Hamming distance.

Let A,BS be two unweighted adjacency matrices. Because dH(A,B) counts the number of (unweighted) edges that are different between the graphs, we have

dHA,B=1i<jnaij+1i<jnbij21i<jnaijbij.(15)

Now, assume that A and B are two weighted adjacency matrices, with aij,bij[0,1]. A natural extension of Equation 15 to matrices with entries in [0,1] is therefore given by

δA,B=1i<jnaij+1i<jnbij21i<jnaijbij.(16)

The function δ, defined on the space of weighted adjacency matrices with weights in [0,1], satisfies all the properties of a distance, except for the triangle inequality.

We now present the sample probability matrix P̂ and the sample correlation ρ̂. Let A(k),1kN be adjacency matrices sampled independently from Gn,p,q. We define

P̂ij= defÊaij= def1Nk=1Naij(k).(17)

and

ρ̂ij,ij= defÊρij,ij= def1Nk=1Naij(k)aij(k).(18)

We can combine the definitions of δ and P̂ to derive the following expression for the Fréchet function F̂1 for the sample median, defined by Equation 14,

F̂1(B)=δB,P̂.(19)

The proof of this simple identity is very similar to the proof of lemma 1, and is omitted for brevity. We are now ready to present the proof of theorem 1.

Proof of Theorem 1. The proof relies on the observation (formalized in lemma 1) that the Fréchet function F̂2(B) can be expressed as the sum of a dominant term and a residual. The residual becomes increasingly small in the limit of large sample size (see lemma 3) and can be neglected. We show in lemma 2 that the dominant term is minimum for the sample Fréchet median network m̂P [defined by Equation 12]. We start with the decomposition of F̂2(B) in terms of a dominant term and a residual.

Lemma 1. Let BS. We denote by EB the set of edges of the network with adjacency matrix B, we denote by ĒB the set of “nonedges.” Then

F̂2(B)=δ2B,P̂1i<jn1i<jnP̂ijP̂i,jÊρij,ij+4[i,j]EB[i,j]ĒBP̂ijP̂i,jÊρij,ij.(20)

where P̂ is defined by Equation 17, and ρ̂ is defined by Equation 18.

Proof. The proof of lemma 1 is provided in Section 5.

To call attention to the distinct roles played by the terms in Equation 20, we define the dominant term of F̂2(B),

F̂(B)= defδ2B,P̂1i<jn1i<jnP̂ijP̂i,jÊρij,ij,(21)

and the residual ζN is defined by

ζN(B)=4[i,j]EB[i,j]ĒBP̂ijP̂i,jÊρij,ij,(22)

so that F̂2(B)=F̂(B)+ζN(B).

The next step of the proof of theorem 1 involves showing that the sample median network, m̂P, [see Equation 12], which is the minimizer of F̂1(B) [see Equation 14], is also the minimizer of F̂(B).

Lemma 2. m̂P satisfies: BS,F̂m̂PF̂(B).

Proof of lemma 2. We have

F̂(B)=δ2B,P̂1i<jn1i<jnP̂ijP̂i,jÊρij,ij(23)

Because m̂P is the minimizer of F̂1(B)=δ(B,P̂) [see Equation 19], m̂P is also the minimizer of δ2(B,P̂). Finally, since 1i<jn1i<jnP̂ijP̂i,jÊρij,ij does not depend on B, m̂P is the minimizer of F̂(B).

We now turn our attention to the residual and we confirm in the next lemma that ζN(B)=OP1N; to wit ζN(B)N is bounded with high probability.

Lemma 3. ε>0,c>0,N1,

PA(k)Gn,p,q;ζN(B)<cN>1ε.(24)

Proof. The proof of lemma 3 is provided in Section 5.2.

The last technical lemma that is needed to complete the proof of theorem 1 is a variance inequality [31] for F̂. We assume that the entries of P are uniformly away from 1/2 (this technical condition on P prevents the instability that occurs when estimating μ̂P for pij=1/2).

Lemma 4. We assume that there exists η>0 such that 1i<jn,|pij1/2|>η. Then, α>0

BS,αBm̂P12F̂(B)F̂m̂P,(25)

with high probability.

Proof. The proof of lemma 4 is provided in Section 5.3.

We are now in position to combine the lemmata and complete the proof of theorem 1.

Let μ̂P be the sample Fréchet mean network, and let m̂P be the sample Fréchet median network. By definition, μ̂P is the minimizer of F̂2, and thus

F̂(μ̂P)=F̂2(μ̂P)ζNμ̂PF̂2m̂PζNμ̂P(26)

Now, by definition of F̂ in Equation 21, we have

F̂2m̂PζNμ̂P=F̂m̂P+ζNm̂PζNμ̂P,(27)

and therefore,

0F̂(μ̂P)F̂m̂PζNm̂PζNμ̂P.(28)

This last inequality, combined with Equation 24 proves that F̂(μ̂P)F̂m̂P converges to zero for large N. We can say more; using the variance inequality Equation 25, we prove that dH(μ̂P,m̂P)=μ̂Pm̂P1 converges in probability to zero for large N.

Let ε>0, from lemma 4, there exists α>0 such that

PA(k)Gn,p,q;αμ̂Pm̂P12F̂(μ̂P)F̂m̂P>1ε.(29)

The term ζN(m̂P)ζN(μ̂P) is controlled using Lemma 3,

C,N1,PPS,ζNm̂PζNμ̂P<CN1ε(30)

Combining Equations 2830 we get

N1,Pμ̂Pm̂P12<CαN>1ε.(31)

We conclude that N1 such that

NN1,Pμ̂Pm̂P1<ε>1ε,(32)

which completes the proof of the theorem.

3.3 The sample Fréchet mean of Gn,p,q computed with the resistance distance

Here we equip the probability space S,P with the resistance metric defined by Equation 8. Let A(k),1kN be adjacency matrices sampled independently from Gn,p,q, and let R(k) be their effective resistances. Because the resistance metric relies on the comparison of connectivity at multiple scales, we expect that the sample Fréchet mean network recovers the topology induced by the communities.

In the following, we need to ensure that the effective resistances are always well defined for networks sampled from Gn,p,q, and we therefore require a very mild condition of the edge density. We assume that p=ωlogn/n and q=ωlogn/n. For instance, this condition is satisfied if p=a1(logc1n)/n, and q=a2(logc2n)/n, with a1,a2>0, c1,c2>1.

The next theorem proves that the sample Fréchet mean converges toward the expected adjacency matrix P (see Section 2) in the limit of large networks.

Theorem 2. Let μ̂P be the sample Fréchet mean computed using the effective resistance distance. Then

μ̂P=EA=P,(33)

in the limit of large network size n, with high probability.

Proof of theorem 2. The proof combines three elements. We first observe that the effective resistance of the sample Fréchet mean is the sample mean effective resistance.

Lemma 5. Let μ̂P be the sample Fréchet mean computed using the resistance distance. Then

R̂ij= defRijμ̂P=1Nk=1NRijk(34)

Proof of lemma 5. The proof relies on the observation that the Fréchet function in Equation 10, is a quadratic function of R̂ij=Rij(μ̂P). Indeed, we have

1Nk=1N1i<jnR̂ijRijk2=1i<jn1Nk=1NR̂ijRijk2(35)

where we have used the definition of the effective resistance distance given by Equation 8. The minimum of Equation 35 is given by Equation 34.

The second element of the proof of theorem 2 is a concentration result for the effective resistance Rij for networks in Gn,p,q, when the network size n becomes large. Our technique of proof is different from that of Theorem 1. In Theorem 1, we rely on laws of large numbers (for large sample size N) to compute the minimum of the Fréchet function F̂2.

In contrast, the proof of theorem 2 follows a different line of attack, where we replace the law of large number with a concentration result for the effective resistance Rij of Gn,p,q for large network size n. Our estimates are independent of the sample size N; they only become sharper as the graph size n. Others have derived similar results (e.g., [29, 3336]).

In the next lemma, we prove that (1/N)k=1NRij(k) concentrates around R*ij in the limit of large network size n.

Lemma 6. Let G=(V,E) a graph sampled from Gn,p,q. Let i,j be two nodes in V. Then the effective resistance Rij between i and j is given by

www.frontiersin.org

where

R*ij=4np+qif i and j are in the same community,4np+q+pqp+q4n2qif i and j are in different communities.(37)

Before deriving the proof of lemma 6 we make a few remarks to help guide the reader’s intuition.

Remark 3. We justify Equation 37 with a simple circuit argument. We first analyse the case where i and j belong to the same community, say C1. In this case, we can neglect the other community C2 because of the bottleneck created by the across-community edges. Consequently, C1 is approximately an Erdős-Rényi network wherein the effective resistance Rij concentrates around 4/(n(p+q)) [29], and we obtain the first term in Equation 37.

On the other hand, when the vertices i and j are in distinct communities, then a simple circuit argument shows that

Rij2np+q+1k+2np+q,(38)

where k is the number of across-community edges, creating a bottleneck with effective resistance 1/k between the two communities [37]; each term 2/(n(p+q)) accounts for the effective resistance from node i (respectively j) to a node incident to an across-community edge. Because the number of across-community edges, k, is a binomial random variable, it concentrates around its mean, qn2/4. Finally, 1/k is a binomial reciprocal whose mean is given by 4/(qn2)+ø1/n3 [38], and we recover the second term of Equation 37.

Our proof of lemma 6, requires that we introduce another operator on the graph, the normalized Laplacian matrix (e.g., [28]). Let A be the adjacency matrix of a network (V,E), and let D be the diagonal matrix of degrees, di=j=1naij. We normalize A in a symmetric manner, and we define

A^=D1/2AD1/2,(39)

where D1/2 is the diagonal matrix with entries 1/di. The normalized Laplacian matrix is defined by

L=IA^,(40)

where I is the identity matrix. L is positive semi-definite [28], and we will consider its Moore-Penrose pseudoinverse, L.

Proof of lemma 6. The lemma relies on the characterization of R in terms of L [28],

Rij=uiuj,Luiuj,(41)

where ui=(1/di)ei, and ei is the ith vector of the canonical basis. Let 1=λ1λ2λn1 be the eigenvalues of Â, and let Π1,,Πn be the corresponding orthogonal projectors,

A^=m=1nλmΠm,(42)

where Π1=τ1d1/2d1/2T, with d1/2=d1dnT, and τ=i=1ndi. Because Π1 is also the orthogonal projection on the null space of L, we have

L=L+Π11Π1=IA^Π11Π1=IΠ1+λ21λ2Π2+Q,(43)

where

Q=m=3nλm1λmΠm.(44)

Substituting Equation 43 into Equation 41, we get

Rij=uiuj,IΠ1uiuj+λ21λ2uiuj,Π2uiuj+uiuj,Quiuj(45)

The first (and dominant) term of Equation 45 is

uiuj,IΠ1uiuj=uiuj,uiuj=1di+1dj.(46)

Let us examine the second term of Equation 45. Löwe and Terveer [39] provide the following estimate for λ2,

λ2=pqp+q+ωn,whereωn=O2lognnp+q.(47)

The corresponding eigenvector z is given, with probability (1O1), by [40],

www.frontiersin.org

where the “sign” vector σ, which encodes the community, is given by

σi=1if1in/2,1ifn/2+1in.(49)

We derive from Equation 48 the following approximation to ui,Π2uj,

www.frontiersin.org

We therefore have

www.frontiersin.org

The degree di of node i is a binomial random variable, which concentrates around its mean, p(n/21)+qn/2n(p+q)/2 for large network size n. Also, 1/di is a binomial reciprocal that also concentrates around its mean, which is given by 2/((p+q)n)+ø1/n2 [38]. We conclude that in the limit of large network size,

www.frontiersin.org

Combining Equations 47, 52 yields

www.frontiersin.org

We note that

pqp+q4n2q1σiσj2=4np+qif i and j are in the same community4np+q+pqp+q4n2qif i and j are in different communities,(54)

which confirms that uiuj,λ21λ2Π2(uiuj) provides the correction in Equation 37 created by the bottleneck between the communities. Finally, we show in Section 5.4 that the last term in the expansion of Rij Equation 45 can be neglected,

uiuj,Quiuj1di+1dj82np3/2almost surely.(55)

This concludes the proof the lemma.

Remark 4. Lemma 6 can be extended to a stochastic block model of any geometry for which we can derive the analytic expression of the dominant eigenvalues; see (see e.g., [39, 41]) for equal size communities, and (see e.g., [42]) for the more general case of inhomogeneous random networks.

We can apply Lemma 6 to derive an approximation to the sample mean effective resistance.

Corollary 1. Let A(k),1kN be adjacency matrices sampled independently from Gn,p,q, and let R(k),1kN be the respective effective resistance matrices. Then

www.frontiersin.org

where R*ij is given by Equation 37.

Lastly, the final ingredient of the proof of theorem 2 is Lemma 7 that shows that matrix R*, given by Equation 37, is the effective resistance of the expected adjacency matrix of (S,P), R*=REA.

Lemma 7. Let R be the n×n effective resistance matrix of a network with adjacency matrix A. If

R=4np+qJ+pqp+q4n2qK,(57)

where J=Jn, and K is the n×n matrix associated with the cross-community edges,

K=0110Jn/2.(58)

Then A=P, where P is given by Equation 2.

Proof of lemma 7. The proof is elementary and relies on the following three identities. First, we recover L, the pseudo-inverse of the combinatorial Laplacian L=DA, from R,

L=12I1nJRI1nJ.(59)

We can then recover L from L; for every α0, we have

L=L+αnJ1αnJ.(60)

Finally, A=L+diag(L).

This concludes the proof of theorem 2.

4 Discussion of our results

This paper provides analytical estimates of the sample Fréchet mean network when the sample is generated from a stochastic block model. We derived the expression of the sample Fréchet mean when the probability space Gn,p,q is equipped with two very different distances: the Hamming distance and the resistance distance. This work answers the question raised by Lunagómez et al. [5] “what is the “mean” network (rather than how do we estimate the success-probabilities of an inhomogeneous random network), and do we want the “mean” itself to be a network?”.

We show that the sample mean network is an unweighted network whose topology is usually very different from the average topology of the sample. Specifically, in the regime of networks where minpij<1/2 (e.g., networks with øn2 but ω(n) edges), then the sample Fréchet mean is the empty network, and is pointless. In contrast, the resistance distance leads to a sample Fréchet mean that recovers the correct topology induced by the community structure; the edge density of the sample Fréchet mean network is the expected edge density of the random network ensemble. The effective resistance distance is thus able to capture the large scale (community structure) and the mesoscale, which spans scales from the global to the local scales (the degree of a vertex).

This work is significant because it provides for the first time analytical estimates of the sample Fréchet mean for the stochastic blockmodel, which is at the cutting edge of rigorous probabilistic analysis of random networks [12]. The technique of proof that is used to compute the sample Fréchet mean for the Hamming distance can be extended to the large class of inhomogeneous random networks [43]. It should also be possible to extend our computation of the Fréchet mean with the resistance distance to stochastic block models with K communities of arbitrary size, and varying edge density.

From a practical standpoint, our work informs the choice of distance in the context where the sample Fréchet mean network has been used to characterize the topology of networks for network-valued machine learning (e.g., detecting change points in sequences of networks [2, 8], computing Fréchet regression [6], or cluster network datasets [7]). Future work includes the analysis of the sample Fréchet mean when the distance is based on the eigenvalues of the normalized Laplacian Wills and Meyer [20].

5 Additional proofs

5.1 Proof of lemma 1

We start with a simple result that provides an expression for the Hamming distance squared. Let A,BS, and let EB denote the set of edges of B, and ĒB denote the set of “nonedges” of B. We denote by EB the number of edges in B. Then, the Hamming distance squared is given by

dH2A,B=EB2+2EBi,jĒBaiji,jEBaij4[i,j]ĒB[i,j]EBaijaij+1i<jnaij2(61)

The proof of Equation 61 is elementary, and is omitted for brevity. We now provide the proof of lemma 1.

Proof of lemma 1. Applying Equation 61 for each network G(k), we get

F̂2(B)=EB2+2EBi,jĒB1Nk=1Naij(k)i,jEB1Nk=1Naij(k)4[i,j]ĒB[i,j]EB1Nk=1Naij(k)aij(k)+1Nk=1N1i<jnaij(k)2(62)

Using the expressions for the sample mean Equation 17 and correlation Equation 18, and observing that

1Nk=1N1i<jnaij(k)2=1i<jn1i<jn1Nk=1Naij(k)aij(k)=1i<jn1i<jnÊρij,ij,(63)

we get

F̂2(B)=EB2+2EBi,jĒBP̂iji,jEBP̂i,j4[i,j]ĒB[i,j]EBÊρij,ij+1i<jn1i<jnÊρij,ij.(64)

Also, we have

EB2+2EBi,jĒBP̂iji,jEBP̂i,j=EB2i,jEBP̂i,jEB+2i,jĒBP̂ij+4[i,j]ĒB[i,j]EBP̂ijP̂i,j.

Whence

EB2+2EBi,jĒBP̂iji,jEBP̂i,j=EB2i,jEBP̂i,jEB2i,jEBP̂i,j+21i<jnP̂ij+4[i,j]ĒB[i,j]EBP̂ijP̂i,j=EB2i,jEBP̂i,j2+21i<jnP̂ijEB2i,jEBP̂i,j+1i<jnP̂ij2+4[i,j]ĒB[i,j]EBP̂ijP̂i,j1i<jnP̂ij2

Completing the square yields

EB2+2EBi,jĒBP̂iji,jEBP̂i,j=EB2i,jEBP̂i,j+1i<jnP̂ij2+4i,jĒBi,jEBP̂ijP̂i,j1i<jnP̂ij2=i,jEB12P̂i,j+1i<jnP̂ij2+4[i,j]ĒB[i,j]EBP̂ijP̂i,j1i<jn1i<jnP̂ijP̂i,j.(65)

We can then substitute Equations 63, 65 into Equation 64, and we get the result advertised in the lemma,

F̂2(B)=i,jEB12P̂ij+1i<jnP̂ij21i<jn1i<jnP̂ijP̂i,jP̂i,jÊρij,ij+4[i,j]ĒB[i,j]EBP̂ijP̂i,jÊρij,ij,(66)

where we recognize the first term as δ2(B,P̂).

5.2 Proof of lemma 3

Proof of lemma 3. We recall that the residual ζN(B) is a sum of two types of terms,

ζNB=[i,j]ĒB[i,j]EBP̂ijP̂i,jÊρij,ij.(67)

The sample mean P̂ij, Equation 17, is the sum of N independent Bernoulli random variables, and it concentrates around its mean pij. The variation of P̂ij around pij is bounded by Hoeffding inequality,

1i<jn,N1,PA(k)Gn,p,q;P̂ijpijδexp2Nδ2.(68)

Let ε>0, and let α= deflog(n/ε/2), a union bound yields

N1,PA(k)Gn,p,q;1i<j<n,P̂ijpijαN>1ε/4.(69)

The sample correlation, ρ̂ij,ij, Equation 18, is evaluated in Equation 67 for [i,j]EB and [i,j]ĒB. In this case, the edges [i,j] and [i,j] are always distinct, thus aij(k) and aij(k) are independent, and aij(k)aij(k) is a Bernoulli random variable with parameter pijpij. We conclude that ρ̂ij,ij is the sum of N independent Bernoulli random variables, and thus concentrates around its mean, pijpij.

Let ε>0, and let β= deflog(n2/2ε), Hoeffding inequality and a union bound yield

N1,P1i<jn,1i<jn,i,ji,j,Êρij,ijpijpijβN>1ε/2,(70)

Combining Equations 69, 70 yields

ε>0,α1,α2,β,N1,1i<jn,1i<jn,i,ji,j,P̂ijpijα1N,P̂i,jpij1N,andÊρij,ijpijpijβN,(71)

with probability 1ε. Lastly, combining Equations 67, 71, we get the advertised result,

ε>0,c>0,N1,P[i,j]ĒB[i,j]EBP̂ijP̂i,jpijpijÊρij,ij+pijpijcN=PζNBcN>1ε.(72)

5.3 Proof of lemma 4

We first provide some inequalities (the proof of which are omitted) that relate δ to the matrix norm 1.

Lemma 8. Let A,B and C be weighted adjacency matrices, with aij,bij,cij[0,1]. We have

12AB1δA,B,and12AC1δA,B+δB,C.(73)

Proof of lemma 4. Let B,m̂PS. From the definition of F̂ (see Equation 21) we have

F̂(B)F̂m̂P=δ2B,P̂δ2m̂P,P̂=δB,P̂δm̂P,P̂δB,P̂+δm̂P,P̂.(74)

Because of Lemma 8, we have

δB,P̂+δm̂P,P̂Bm̂P1.(75)

Also,

δB,P̂δm̂P,P̂=1i<jnbijm̂Pij21i<jnp̂ijbijm̂Pij=1i<jn12p̂ijbijm̂Pij.(76)

The entries of m̂P are equal to 1 only along E(m̂P), and 0 along Ē(m̂P). Therefore,

δB,P̂δm̂P,P̂=i,jĒm̂Pbij12p̂ij+i,jEm̂P1bij2p̂ij1.(77)

Let ε>0, because of the concentration of p̂ij=P̂ij around pij, N0, NN0,

P1i<jn,P̂ijpij<ε/2>1ε.(78)

We recall that we assume that |pij1/2|>η,1i<jn, and therefore we get that for all 0<ε<2η,

P1i<jn,2P̂ij1>2ηε>1ε.(79)

Because m̂P is constructed using the majority rule, we have

2P̂ij1=2p̂ij1ifi,jEm̂P,12p̂ijifi,jĒm̂P.(80)

Substituting the expression of 2P̂ij1 in Equation 79 yields the following lower bounds, with probability 1ε,

2p̂ij1>2ηεifi,jEm̂P,12p̂ij>2ηεifi,jĒm̂P.(81)

Inserting the inequalities given by Equation 81 into Equation 77 gives the following lower bound that happens with probability 1ε,

δB,P̂δm̂P,P̂2ηεi,jEm̂P1bij+i,jĒm̂Pijbij.(82)

We bring the proof to an end by observing that

Bm̂P1=1i<jnbijm̂Pij=i,jEm̂Pbij1+i,jĒm̂Pbij=i,jEm̂P1bij+i,jĒm̂Pbij,(83)

whence we conclude that

δB,P̂δm̂P,P̂2ηεBm̂P1,(84)

with probability 1ε. Finally, combining Equations 75, 84, and letting α= def2ηε, we get the inequality advertised in Lemma 4,

αBm̂P12F̂(B)F̂m̂P,(85)

with probability 1ε.

5.4 Proof of Equation 55

We show that uiuj,Q(uiuj)1di+1dj82(np)3/2 almost surely.

Proof. Let i,j[n]. We have.

uiuj,Quiuj=uiuj,m=3nλm1λmΠmuiuj(86)
m=3nλm1λmΠmuiuj21di+1djm=3nλm1λmΠm.(87)

Now

m=3nλm1λmΠm2=m=3nλm1λm2Πm2m=3n2λm2Πm22maxm=2nλm2,(88)

because the Πm are orthonormal projectors such that m=1nΠm=I. Using the following concentration result (e.g., Theorem 3.6 in [44]),

maxm=3nλm8npalmost surely.(89)

we conclude that

uiuj,Quiuj1di+1dj82np3/2almost surely.(90)

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

FM: Conceptualization, Data curation, Formal Analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing–original draft, Writing–review and editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. FM was supported by the National Science Foundation (CCF/CIF 1815971).

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Dubey P, Müller H-G. Fréchet change-point detection. The Ann Stat (2020) 48:3312–35. doi:10.1214/19-aos1930

CrossRef Full Text | Google Scholar

2. Ghoshdastidar D, Gutzeit M, Carpentier A, Von Luxburg U. Two-sample hypothesis testing for inhomogeneous random graphs. The Ann Stat (2020) 48:2208–29. doi:10.1214/19-aos1884

CrossRef Full Text | Google Scholar

3. Ginestet CE, Li J, Balachandran P, Rosenberg S, Kolaczyk ED. Hypothesis testing for network data in functional neuroimaging. The Ann Appl Stat (2017) 11:725–50. doi:10.1214/16-aoas1015

CrossRef Full Text | Google Scholar

4. Kolaczyk ED, Lin L, Rosenberg S, Walters J, Xu J. Averages of unlabeled networks: geometric characterization and asymptotic behavior. The Ann Stat (2020) 48:514–38. doi:10.1214/19-aos1820

CrossRef Full Text | Google Scholar

5. Lunagómez S, Olhede SC, Wolfe PJ. Modeling network populations via graph distances. J Am Stat Assoc (2021) 116:2023–40. doi:10.1080/01621459.2020.1763803

CrossRef Full Text | Google Scholar

6. Petersen A, Müller H-G. Fréchet regression for random objects with euclidean predictors. The Ann Stat (2019) 47:691–719. doi:10.1214/17-aos1624

CrossRef Full Text | Google Scholar

7. Xu H. Gromov-Wasserstein factorization models for graph clustering. Proc AAAI Conf Artif intelligence (2020) 34(04):6478–85. doi:10.1609/aaai.v34i04.6120

CrossRef Full Text | Google Scholar

8. Zambon D, Alippi C, Livi L. Change-point methods on a sequence of graphs. IEEE Trans Signal Process (2019) 67:6327–41. doi:10.1109/tsp.2019.2953596

CrossRef Full Text | Google Scholar

9. Chowdhury S, Mémoli F. The metric space of networks (2018). arXiv preprint arXiv:1804.02820.

Google Scholar

10. Jain BJ. Statistical graph space analysis. Pattern Recognition (2016) 60:802–12. doi:10.1016/j.patcog.2016.06.023

CrossRef Full Text | Google Scholar

11. Snijders TA. Statistical models for social networks. Annu Rev Sociol (2011) 37:131–53. doi:10.1146/annurev.soc.012809.102709

CrossRef Full Text | Google Scholar

12. Abbe E. Community detection and stochastic block models: recent developments. J Machine Learn Res (2018) 18:1–86.

CrossRef Full Text | Google Scholar

13. Airoldi EM, Costa TB, Chan SH. Stochastic blockmodel approximation of a graphon: theory and consistent estimation. In: Advances in neural information processing Systems (2013). p. 692–700.

Google Scholar

14. Ferguson D, Meyer FG. Theoretical analysis and computation of the sample Fréchet mean of sets of large graphs for various metrics. Inf Inference (2023) 12:1347–404. doi:10.1093/imaiai/iaad002

CrossRef Full Text | Google Scholar

15. Olhede SC, Wolfe PJ. Network histograms and universality of blockmodel approximation. Proc Natl Acad Sci (2014) 111:14722–7. doi:10.1073/pnas.1400374111

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Mitzenmacher M, Upfal E. Probability and computing: randomization and probabilistic techniques in algorithms and data analysis. Cambridge, United Kingdom: Cambridge University Press (2017).

Google Scholar

17. Monnig ND, Meyer FG. The resistance perturbation distance: a metric for the analysis of dynamic networks. Discrete Appl Mathematics (2018) 236:347–86. doi:10.1016/j.dam.2017.10.007

CrossRef Full Text | Google Scholar

18. Akoglu L, Tong H, Koutra D. Graph based anomaly detection and description: a survey. Data Mining Knowledge Discov (2015) 29:626–88. doi:10.1007/s10618-014-0365-y

CrossRef Full Text | Google Scholar

19. Donnat C, Holmes S. Tracking network dynamics: a survey using graph distances. The Ann Appl Stat (2018) 12:971–1012. doi:10.1214/18-aoas1176

CrossRef Full Text | Google Scholar

20. Wills P, Meyer FG. Metrics for graph comparison: a practitioner’s guide. PLoS ONE (2020) 15(2):e0228728–54. doi:10.1371/journal.pone.0228728

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Banks D, Carley K. Metric inference for social networks. J classification (1994) 11:121–49. doi:10.1007/bf01201026

CrossRef Full Text | Google Scholar

22. Han F, Han X, Liu H, Caffo B. Sparse median graphs estimation in a high-dimensional semiparametric model. The Ann Appl Stat (2016) 10:1397–426. doi:10.1214/16-aoas940

CrossRef Full Text | Google Scholar

23. Chen J, Hermelin D, Sorge M. On computing centroids according to the p-norms of hamming distance vectors. In: 27th annual European symposium on algorithms, 144 (2019). p. 1–28.28

Google Scholar

24. Ferrer M, Serratosa F, Sanfeliu A. Synthesis of median spectral graph. In: Pattern recognition and image analysis: second iberian conference (2005). p. 139–46.

CrossRef Full Text | Google Scholar

25. White D, Wilson RC. Spectral generative models for graphs. In: 14th international conference on image analysis and processing (ICIAP 2007). IEEE (2007). p. 35–42.

CrossRef Full Text | Google Scholar

26. Doyle PG, Snell JL. Random walks and electric networks (Mathematical Assoc. of America) (1984).

Google Scholar

27. Klein D, Randić M. Resistance distance. J Math Chem (1993) 12:81–95. doi:10.1007/bf01164627

CrossRef Full Text | Google Scholar

28. Bapat RB. Graphs and matrices, 27. Springer (2010).

Google Scholar

29. Von Luxburg U, Radl A, Hein M. Hitting and commute times in large random neighborhood graphs. The J Machine Learn Res (2014) 15:1751–98.

Google Scholar

30. Fréchet M. Les espaces abstraits et leur utilité en statistique théorique et même en statistique appliquée. J de la Société Française de Statistique (1947) 88:410–21.

Google Scholar

31. Sturm K-T. Probability measures on metric spaces of nonpositive. Heat Kernels Analysis Manifolds, Graphs, Metric Spaces (2003) 338:357.

CrossRef Full Text | Google Scholar

32. Jiang X, Munger A, Bunke H. On median graphs: properties, algorithms, and applications. IEEE Trans Pattern Anal Machine Intelligence (2001) 23:1144–51. doi:10.1109/34.954604

CrossRef Full Text | Google Scholar

33. Löwe M, Terveer S. A central limit theorem for the mean starting hitting time for a random walk on a random graph. J Theor Probab (2023) 36:779–810. doi:10.1007/s10959-022-01195-9

CrossRef Full Text | Google Scholar

34. Sylvester J. Random walk hitting times and effective resistance in sparsely connected Erdős-Rényi random graphs. J Graph Theor (2021) 96:44–84. doi:10.1002/jgt.22551

CrossRef Full Text | Google Scholar

35. Ottolini A, Steinerberger S. Concentration of hitting times in erdős-rényi graphs. J Graph Theor (2023) 107:245–62. doi:10.1002/jgt.23119

CrossRef Full Text | Google Scholar

36. Wills P, Meyer FG. Change point detection in a dynamic stochastic blockmodel. In: Complex networks and their applications VIII (2020). p. 211–22.

CrossRef Full Text | Google Scholar

37. Levin DA, Peres Y, Wilmer EL. Markov chains and mixing times. American Mathematical Soc. (2009).

Google Scholar

38. Rempała G. Asymptotic factorial powers expansions for binomial and negative binomial reciprocals. Proc Am Math Soc (2004) 132:261–72. doi:10.1090/s0002-9939-03-07254-x

CrossRef Full Text | Google Scholar

39. Löwe M, Terveer S. Hitting times for random walks on the stochastic block model (2024). arXiv preprint arXiv:2401.07896.

Google Scholar

40. Deng S, Ling S, Strohmer T. Strong consistency, graph laplacians, and the stochastic block model. J Machine Learn Res (2021) 22:1–44.

Google Scholar

41. Avrachenkov K, Cottatellucci L, Kadavankandy A. Spectral properties of random matrices for stochastic block model. In: International symposium on modeling and optimization in mobile, ad hoc, and wireless networks (2015). p. 537–44.

CrossRef Full Text | Google Scholar

42. Chakrabarty A, Chakraborty S, Hazra RS. Eigenvalues outside the bulk of inhomogeneous Erdős–Rényi random graphs. J Stat Phys (2020) 181:1746–80. doi:10.1007/s10955-020-02644-7

CrossRef Full Text | Google Scholar

43. Kovalenko I. Theory of random graphs. Cybernetics (1971) 7:575–9. doi:10.1007/bf01071028

CrossRef Full Text | Google Scholar

44. Chung F, Lu L, Vu V. The spectra of random graphs with given expected degrees. Internet Math (2003) 1:257–75. doi:10.1080/15427951.2004.10129089

CrossRef Full Text | Google Scholar

Keywords: network-valued data, network barycenter, network topology, statistical network analysis, Fréchet mean, network distance

Citation: Meyer FG (2024) When does the mean network capture the topology of a sample of networks?. Front. Phys. 12:1455988. doi: 10.3389/fphy.2024.1455988

Received: 27 June 2024; Accepted: 21 August 2024;
Published: 08 October 2024.

Edited by:

Víctor M. Eguíluz, Spanish National Research Council (CSIC), Spain

Reviewed by:

Renaud Lambiotte, University of Oxford, United Kingdom
Mingao Yuan, North Dakota State University, United States

Copyright © 2024 Meyer. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: François G. Meyer, fmeyer@colorado.edu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.