On the Construction of Sparse Matrices From Expander Graphs

Bah, Bubacarr; Tanner, Jared

doi:10.3389/fams.2018.00039

ORIGINAL RESEARCH article

Front. Appl. Math. Stat., 04 September 2018

Sec. Mathematics of Computation and Data Science

Volume 4 - 2018 | https://doi.org/10.3389/fams.2018.00039

This article is part of the Research TopicRecent Developments in Signal Approximation and ReconstructionView all 6 articles

On the Construction of Sparse Matrices From Expander Graphs

Bubacarr Bah^1,2^*

Jared Tanner³

¹The Research Centre, African Institute for Mathematical Sciences, Muizenberg, South Africa
²Department of Mathematical Sciences, University of Stellenbosch, Stellenbosch, South Africa
³Mathematics Institute, University of Oxford, Oxford, United Kingdom

We revisit the asymptotic analysis of probabilistic construction of adjacency matrices of expander graphs proposed in Bah and Tanner [1]. With better bounds we derived a new reduced sample complexity for d, the number of non-zeros per column of these matrices (or equivalently the left-degree of the underlying expander graph). Precisely $d = O ({log}_{s} (N / s))$ ; as opposed to the standard $d = O (log (N / s))$ , where N is the number of columns of the matrix (also the cardinality of set of left vertices of the expander graph) or the ambient dimension of the signals that can be sensed by such matrices. This gives insights into why using such sensing matrices with small d performed well in numerical compressed sensing experiments. Furthermore, we derive quantitative sampling theorems for our constructions which show our construction outperforming the existing state-of-the-art. We also used our results to compare performance of sparse recovery algorithms where these matrices are used for linear sketching.

1. Introduction

Sparse binary matrices, say A ∈ {0, 1}^n×N, with n ≪ N are widely used in applications including graph sketching [2, 3], network tomography [4, 5], data streaming [6, 7], breaking privacy of databases via aggregate queries [8], compressed imaging of intensity patterns [9], and more generally combinatorial compressed sensing [10–15], linear sketching [7], and group testing [16, 17]. In all these areas we are interested in the case where n ≪ N, in which case A is used as an efficient encoder of sparse signals x ∈ ℝ^N with sparsity s ≪ n, where they are known to preserve ℓ¹ distance of sparse vectors [18]. Conditions that guarantee that a given encoder, A, also referred to as a sensing matrix in compressed sensing, typically include the the nullspace, coherence, and the restricted isometry conditions, see [19] and references there in. The goal is for A to satisfy one or more of these conditions with the minimum possible n, the number of measurements. For uniform guarantees over all A, it has been established that n has to be Ω(s²), but that with high probability on the draw of random A, n can be $O (s log N / n)$ for A with entries drawn from a sub-gaussian distribution, see [19] for a review of such results. Matrices with entries drawn from a Bernoulli distribution fall in the family of sub-gaussian but these are dense as opposed the the sparse binary matrices considered here. For computational advantages, such as faster application and smaller storage, it is advantageous to use sparse A in application [1, 14, 18].

Herein we consider the n achievable when A is an adjacency matrix of an expander graph [18], expander graph will be defined in the next section. Since the construction of such matrices can be construed as either a linear algebra problem or equivalently a graph theory one (in this manuscript we will focus more on the linear algebra discourse). There has been significant research on expander graphs in pure mathematics and theoretical computer science, see [20] and references therein. Both deterministic and probabilistic constructions of expander graphs have been suggested. The best known deterministic constructions achieve $n = O (s^{1 + α})$ for some α > 0 [21]. On the other hand random constructions, first proven in Bassalygo and Pinsker [22], achieve the optimal $n = O (s log (N / s))$ , precisely $n = O (s d)$ , with $d = O (log (N / s))$ , where d is the left degree of the expander graph but also the number of ones in each column of A, to be defined in the next section. However, to the best of our knowledge, it was [1] that proposed a probabilistic construction that is not only optimal but also more suitable to making quantitative statements where such matrices are applied.

This work follows the probabilistic construction proposed in Bah and Tanner[1] but with careful computation of the bounds, is able to achieve $n = O (s log (N / s))$ with $d = O (\frac{log (N / s)}{log s})$ . We retain the complexity of n but got a smaller complexity for d, which is novel. Related results with a similar d were derived in Indyk and Razenshteyn [23] and Bah et al.[24] but for structure sparse signals in the framework of model-based compressing sensing or sketching. In that framework, one has second order information about x beyond simple sparsity, which is first order information about x. It is thus expected and established that it is possible to get a small n and hence a smaller d. Arguably, such a small complexity for d justifies in hindsight fixing d to a small number in simulations with such A as in Bah and Tanner [1], Bah et al. [24], and Mendoza-Smith and Tanner [14], just to mention a few.

The results derived here are asymptotic, though finite dimensional bounds follow directly. We focus on for what ratios of the problem dimensions (s, n, N) does these results hold. There is almost a standard way of interrogating such a question, i.e., phase transitions, probably introduced to the compressed sensing literature by Donoho and Tanner [25]. In other words, we derive sampling theorems numerically depicted by phase transition plots about problem size spaces for which our construction holds. This is similar to what was done in Bah and Tanner [1] but for comparison purposes we include phase transition plots from probabilistic constructions by Buhrman et al. [26] and Berinde [27]. The plots show improvement over these earlier works. Furthermore, we show implications of our results for compressed sensing by using our results with the phase transition framework to compare the performance of selected combinatorial compressed sensing algorithms as is done in Bah and Tanner [1] and Mendoza-Smith and Tanner [14].

The manuscript is organized as follows. Section 1 gives the introduction; while section 2 sets the notation and defines some useful terms. The main results are stated in section 3 and the details of the construction is given in section 4. This is followed by a discussion in section 5 about our results, comparing them to existing results and using the results to compare the performance of some combinatorial compressed sensing algorithms. In section 6 we state the remaining proofs of theorems, lemmas, corollaries, and propositions used in this manuscript. After this section is the conclusion in section 7. We include an Appendix in Supplementary Materials, where we summarized key relevant materials from Bah and Tanner [1], and showed the derivation of some bounds used in the proofs.

2. Preliminaries

2.1. Notation

Scalars will be denoted by lowercase letters (e.g., k), vectors by lowercase boldface letters (e.g., x), sets by uppercase calligraphic letters (e.g., $S$ ) and matrices by uppercase boldface letters (e.g., A). The cardinality of a set $S$ is denoted by $| S |$ and [N]: = {1, …, N}. Given $S \subseteq [N]$ , its complement is denoted by $S^{c} : = [N] \ S$ and $x_{S}$ is the restriction of x ∈ ℝ^N to $S$ , i.e., ${(x_{S})}_{i} = x_{i}$ if $i \in S$ and 0 otherwise. For a matrix A, the restriction of A to the columns indexed by $S$ is denoted by $A_{S}$ . For a graph, $Γ (S)$ denotes the set of neighbors of $S$ , that is the nodes that are connected to the nodes in $S$ , and e_ij = (x_i, y_j) represents an edge connecting node x_i to node y_j. The ℓ_p norm of a vector x ∈ ℝ^N is defined as $‖ x ‖_{p} : = {(\sum_{i = 1}^{N} x_{i}^{p})}^{1 / p}$ .

2.2. Definitions

Below we give formal definitions that will be used in this manuscript.

Definition 2.1 (ℓ_p-norm restricted isometry property). A matrix A satisfies the ℓ_p-norm restricted isometry property (RIP-p) of order s and constant δ_s < 1 if it satisfies the following inequality.

\begin{array}{l} (1 - δ_{s}) ‖ x ‖_{p}^{p} \leq ‖ A x ‖_{p}^{p} \leq (1 + δ_{s}) ‖ x ‖_{p}^{p}, \forall s –sparse x . & (1) \end{array}

The most popular case is RIP-2 and was first proposed in Cand‘es et al. [28]. Typically when RIP is mentioned without qualification, it means RIP₂. In the discourse of this work though RIP-1 is the most relevant. The RIP says that A is a near-isometry and it is a sufficient condition to guarantee exact sparse recovery in the noiseless setting (i.e., y = Ax); or recovery up to some error bound, also referred to as optimality condition, in the noisy setting (i.e., y = Ax + e, where e is the bounded noise vector). we define optimality condition more precisely below.

Definition 2.2 (Optimality condition). Given y = Ax + e and $\hat{x} = Δ (A x + e)$ for a reconstruction algorithm Δ, the optimal error guarantee is

\begin{array}{l} ‖ \hat{x} - x ‖_{p} \leq C_{1} σ_{s} {(x)}_{q} + C_{2} ‖ x ‖_{p}, & (2) \end{array}

where C₁, C₂ > 0 depend only on the RIP constant (RIC), i.e., δ_s, and not the problem size, 1 ≤ q ≤ p ≤ 2, and σ_s(x)_q denote the error of the best s-term approximation in the ℓ_q-norm, that is

\begin{array}{l} σ_{s} {(x)}_{q} : = min_{s - sparse z} ‖ z - x ‖_{q} . & (3) \end{array}

Equation (2) is also referred to as the ℓ_p/ℓ_q optimality condition (or error guarantee). Ideally, we would like ℓ₂/ℓ₂, but the best provable is ℓ₂/ℓ₁ [28], weaker than this is the ℓ₁/ℓ₁ [18], which is what is possible with the A considered in this work.

To aid translation between the terminology of graph theory and linear algebra we define the set of neighbors in both notation.

Definition 2.3 (Definition 1.4 in Bah and Tanner [1]). Consider a bipartite graph $G = ([N], [n], E)$ where $E$ is the set of edges and e_ij = (x_i, y_j) is the edge that connects vertex x_i to vertex y_j. For a given set of left vertices $S \subset [N]$ its set of neighbors is $Γ (S) = {y_{j} | x_{i} \in S and e_{i j} \in E}$ . In terms of the adjacency matrix, A, of $G = ([N], [n], E)$ the set of neighbors of $A_{S}$ for $| S | = s$ , denoted by A_s, is the set of rows with at least one non-zero.

Definition 2.4 (Expander graph). Let $G = ([N], [n], E)$ be a left-regular bipartite graph with N left vertexes, n right vertexes, a set of edges $E$ and left degree d. If, for any ϵ ∈ (0, 1/2) and any $S \subset [N]$ of size $| S | \leq s$ , we have that $| Γ (S) | \geq (1 - ϵ) d | S |$ , then G is referred to as a (s, d, ϵ)-expander graph.

The ϵ is referred to as the expansion coefficient of the graph. A (s, d, ϵ)-expander graph, also called an unbalanced expander graph [18] or a lossless expander graph [29], is a highly connected bipartite graph. We denote the ensemble of n × N binary matrices with d ones per column by $B (N, n; d)$ , or just $B$ to simplify notation. We also will denote the ensemble of n × N adjacency matrices of (s, d, ϵ)-expander graphs as 𝔼(N, n; s, d, ϵ) or simply 𝔼.

3. Results

Before stating the main results of the paper, we summarise here how we improve upon existing results in Bah and Tanner [1], which is generally true of standard random construction of such matrices. Essentially in the random constructions we want one minus the probability in (5) to go to zero as the problem size (s, n, N) → ∞. In Theorem 4.1 (which was Theorem 1.6 in Bah and Tanner [1]) for this to happen we require $d = O (log (N / s))$ based on the bound (7). In this work we derived a tighter bound than (7), which is (12), that requires d to be smaller than the standard bound by a factor of log(s).

The main result of this work is formalized in Theorem 3.1, which is an asymptotic result, where the dimensions grow while their ratios remain bounded. This is also referred to as the proportional growth asymptotics [30, 31]. We point out that it has been observed in practice that applying these matrices (including the simulations we do in this manuscript), which necessitate having finite dimensions, produces results that agree with the theory. However, the theoretical analysis used in this work cannot be applied to the finite setting.

Theorem 3.1. Consider $ϵ \in (0, \frac{1}{2})$ and let d, s, n, N ∈ ℕ, a random draw of an n×N matrix A from $B$ , i.e., for each column of A uniformly assign ones in d out of n positions, as (s, n, N) → ∞ while s/n ∈ (0, 1) and n/N ∈ (0, 1), with probability approaching 1 exponentially, the matrix A ∈ 𝔼 with

\begin{array}{l} d = O (\frac{log (N / s)}{ϵ log s}), and n = O (\frac{s log (N / s)}{ϵ^{2}}) . & (4) \end{array}

The proof of this theorem is found in section 6.1. It is worth emphasizing that the complexity of d is novel and it is the main contribution of this work.

Furthermore, in the proportional growth asymptotics, i.e., as (s, n, N) → ∞ while s/n → ρ and n/N → δ with ρ, δ ∈ (0, 1), for completeness, we derived a phase transition function (curve) in δρ-space below which Theorem 3.1 holds, but above which Theorem 3.1 fails to hold. This is formalized in the following lemma.

Lemma 3.1. Fix $ϵ \in (0, \frac{1}{2})$ and let d, s, n, N ∈ ℕ, as (s, n, N) → ∞ while s/n → ρ ∈ (0, 1) and n/N → δ ∈ (0, 1) then for ρ < (1−γ)ρ_BT(δ; d, ϵ) and γ > 0, a random draw of A from $B$ implies A ∈ 𝔼 with probability approaching 1 exponentially.

The proof of this lemma is given in section 6.2. The phase transition function ρ_BT(δ; d, ϵ) turned out to be significantly higher that those derived from existing probabilistic constructions, hence our results are significant improvement over earlier works. This will be graphically demonstrated with some numerical simulations in section 5.

4. Construction

The standard probabilistic construction is for each column of A to uniformly assign ones in d out of n positions; while the standard approach to derive the probability bounds is to randomly selected s columns of A indexed by $S$ and compute the probability that |A_s| < (1 − ϵ)ds, then do a union bound over all sets $S$ of size s. Our work in Bah and Tanner [1] computed smaller bounds than previous works based on a dyadic splitting of $S$ and derived the following bound. We changed the notation and format of Theorem 1.6 in Bah and Tanner [1] slightly to be consistent with the notation and format in this manuscript.

Theorem 4.1 (Theorem 1.6, Bah and Tanner [1]). Consider d, s, n, N ∈ ℕ, fix $S$ with $| S | \leq s$ , let an n×N matrix A be drawn from $B$ , then

\begin{array}{l} P r o b (| A_{s} | \leq a_{s}) < p_{n} (s, d) \cdot e^{[n \cdot Ψ_{n} (a_{s}, \dots, a_{1})]} & (5) \end{array}

with a₁: = d, and the functions defined as

\begin{array}{l} p_{n} (s, d) = \frac{2}{25 \sqrt{2 π s^{3} d^{3}}}, a n d & (6) \end{array}

\begin{array}{l} Ψ_{n} (a_{s}, \dots, a_{1}) = \frac{1}{n} [3 s log (5 d) + \sum_{i \in Ω} \frac{s}{2 i} ψ_{i}], Ω = {2^{j}}_{j = 0}^{{log}_{2} (s) - 1}, & (7) \end{array}

where ψ_i is given by the following expression

\begin{array}{l} (n - a_{i}) \cdot H (\frac{a_{2 i} - a_{i}}{n - a_{i}}) + a_{i} \cdot H (\frac{a_{2 i} - a_{i}}{a_{i}}) - n \cdot H (\frac{a_{i}}{n}) & (8) \end{array}

and $H (\cdot)$ is the Shannon entropy in base e logarithms, and the index set $Ω = {2^{j}}_{j = 0}^{{log}_{2} (s) - 1}$ .

a) If no restriction is imposed on a_s then the a_i for i > 1 take on their expected value â_i given by

\begin{array}{l} â_{2 i} = â_{i} (2 - \frac{â_{i}}{n}), f o r i \in {2^{j}}_{j = 0}^{{log}_{2} (s) - 1} . & (9) \end{array}

b) If a_s is restricted to be less than â_s, then the a_i for i > 1 are the unique solutions to the following polynomial system

\begin{array}{l} a_{2 i}^{3} - 2 a_{i} a_{2 i}^{2} + 2 a_{i}^{2} a_{2 i} - a_{i}^{2} a_{4 i} = 0, f o r i \in {2^{j}}_{j = 0}^{{log}_{2} (s) - 2}, & (10) \end{array}

with a_2i ≥ a_i for each i.

In this work, based on the same approach as in Bah and Tanner [1], we derive new expressions for the p_n(s, d) and Ψ_n(a_s, …, a₁) in Theorem 4.1, i.e., (6) and (7) respectively, and provide a simpler bound for the improved expression of Ψ_n(a_s, …, a₁).

Lemma 4.1. Theorem 4.1 holds with the functions

\begin{array}{l} p_{n} (s, d) = 2^{- 3} s^{9 / 2} e^{1 / 4}, & (11) \end{array}

\begin{array}{l} Ψ_{n} (a_{s}, \dots, a_{1}) = \frac{1}{n} [\frac{3 \log 2}{2} \log_{2}^{2} s + (\log_{2} s - \frac{3}{2}) \log a_{s} \\ + \sum_{i \in Ω} \frac{s}{2 i} ψ_{i}], f o r Ω = {2^{j}}_{j = 0}^{\log_{2} (s) - 1} . & (12) \end{array}

The proof of the lemma is given in section 6.3. Asymptotically, the argument of the exponential term in the bound of the probability in (5) of Theorem 4.1, i.e., Ψ_n(a_s, …, a₁) in (12) is more important than the polynomial p_n(s, d) in (11) since the exponential factor will dominate the polynomial factor. The significance of the lemma is that Ψ_n(a_s, …, a₁) in (12) is smaller than Ψ_n(a_s, …, a₁) in (7) since $\frac{3 log 2}{2} {log}_{2}^{2} s + ({log}_{2} s - \frac{3}{2}) log a_{s}$ in (12) is asymptotically smaller than 3slog(5d) in (7), because the former is $O (polylog s)$ while the latter is $O (s)$ , since we consider $a_{s} = O (s)$ .

Recall that we are interested in computing Prob(|A_s| ≤ a_s) when a_s = (1 − ϵ)ds. This means having to solve the polynomial equation (10) to compute as small a bound of Ψ_n((1 − ϵ)ds, …, d) as possible. We derive an asymptotic solution to (10) for a_s = (1 − ϵ)ds and use that solution to get the following bounds.

Theorem 4.2. Consider d, s, n, N ∈ ℕ, fix $S$ with $| S | \leq s$ , for η > 0, β ≥ 1, and $ϵ \in (0, \frac{1}{2})$ , let an n × N matrix A be drawn from $B$ , then

\begin{array}{l} P r o b (| A_{s} | \leq (1 - ϵ) d s) < p_{n} (s, d, ϵ) \cdot exp [n \cdot Ψ_{n} (s, d, ϵ)] & (13) \end{array}

where

\begin{array}{l} p_{n} (s, d, ϵ) = \frac{\sqrt[4]{e} \cdot s^{{log}_{2} (1 - ϵ) + 3}}{\sqrt{2^{6} {(1 - ϵ)}^{3} d^{3}}}, & (14) \end{array}

\begin{array}{l} Ψ_{n} (s, d, ϵ) \leq - \frac{1}{2 n} [η β^{- 1} (β - 1) (1 - ϵ) d s \log_{2} (s / 2) \\ - (5 \log 2) \log_{2}^{2} s - 2 \log d \log_{2} s] . & (15) \end{array}

The proof of this theorem is also found in section 6.5. Since Theorem 4.2 holds for a fixed $S$ of size at most s, if we want this to hold for all $S$ of size at most s, we do a union bound over all $S$ of size at most s. This leads to the following probability bound.

Theorem 4.3. Consider d, s, n, N ∈ ℕ, and all $S$ with $| S | \leq s$ , for τ > 0, and $ϵ \in (0, \frac{1}{2})$ , let an n × N matrix A be drawn from $B$ , then

\begin{array}{l} P r o b (| A_{s} | \leq (1 - ϵ) d s) < p_{N} (s, d, ϵ) \cdot e^{[N \cdot Ψ_{N} (s, d, ϵ)]}, & (16) \end{array}

where

\begin{array}{l} p_{N} (s, d, ϵ) = \frac{5 \cdot \sqrt[4]{e} \cdot s^{{log}_{2} (1 - ϵ) + 5 / 2}}{\sqrt{2^{10} {(1 - ϵ)}^{3} d^{3} π (1 - s / N)}}, & (17) \end{array}

\begin{array}{l} Ψ_{N} (s, d, ϵ) \leq - \frac{s}{N} log (\frac{s}{N}) + \frac{s}{N} \\ - \frac{τ (1 - ϵ) d}{2 log 2} \frac{s}{N} log (\frac{s}{2}) + o (N) . & (18) \end{array}

Proof: Applying the union bound over all $S$ of size at most s to (13) leads to the following.

\begin{array}{l} Prob (| A_{s} | \leq (1 - ϵ) d s) < (\binom{N}{s}) p_{n} (s, d, ϵ) e^{[n \cdot Ψ_{n} (s, d, ϵ)]} . & (19) \end{array}

Then we used the upper bound of (143) to bound the combinatorial term $(\binom{N}{s})$ in (19). After some algebraic manipulations, we separated the the polynomial term, given in (17), from the exponential terms whose exponent is

\begin{array}{l} Ψ_{N} (s, d, ϵ) : = H (\frac{s}{N}) + \frac{n}{N} Ψ_{n} (s, d, ϵ) . & (20) \end{array}

We upper bound Ψ_N(s, d, ϵ) in (18) by upper bounding $H (\frac{s}{N})$ with $- \frac{s}{N} log (\frac{s}{N}) + \frac{s}{N}$ and the upper bound of Ψ_n(s, d, ϵ) in (15). The o(N) decays to zeros with N and its a result of dividing the polylogarithmic terms of s in (15), and τ = ηβ⁻¹(β − 1) in (15). This concludes the proof.□

The next corollary easily follows from Theorem 4.3 and it is equivalent to Theorem 3.1. Its statement is that if the conditions therein holds, then the probability that the cardinality of the set of neighbors of any $S$ with $| S | \leq s$ is less than (1 − ϵ)ds goes to zero as dimensions of A grows. On the other hand, the probability that the cardinality of the set of neighbors of any $S$ with $| S | \leq s$ is greater than (1 − ϵ)ds goes to one as dimensions of A grows. Implying that A is the adjacency matrix of a (s, d, ϵ)-expander graph.

Corollary 4.1. Given d, s, n, N ∈ ℕ, and $ϵ \in (0, \frac{1}{2})$ , for $d \geq \frac{c_{d} log (N / s)}{ϵ log s}$ and $n \geq \frac{c_{n} s log (N / s)}{ϵ^{2}}$ , with c_d, c_n > 0. Let an n × N matrix A be drawn from $B$ , in the proportional growth asymptotics

\begin{array}{l} P r o b (| A_{s} | \leq (1 - ϵ) d s) \to 0 . & (21) \end{array}

Proof: It suffice to focus on the exponent of (16), more precisely on the bound of Ψ_N(s, d, ϵ) in (18), i.e.,

\begin{array}{l} - \frac{s}{N} log (\frac{s}{N}) + \frac{s}{N} - \frac{τ (1 - ϵ) d}{2 log 2} \frac{s}{N} log (\frac{s}{2}) + o (N) . & (22) \end{array}

We can ignore the o(N) term as this goes to zero as N grows, and show that the remaining sum is negative. The remaining sum is

\begin{array}{l} - \frac{s}{N} log (\frac{s}{N}) + \frac{s}{N} - \frac{τ (1 - ϵ) d}{2 log 2} \frac{s}{N} log (\frac{s}{2}) \\ = \frac{s}{N} [- log (\frac{s}{N}) + 1 - \frac{τ (1 - ϵ) d}{2 log 2} log (\frac{s}{2})] . & (23) \end{array}

Hence, we can further focus on the sum in the square brackets, and find conditions on d that will make it negative. We require

\begin{array}{l} - log (\frac{s}{N}) + 1 - \frac{τ (1 - ϵ) d}{2 log 2} log (\frac{s}{2}) < 0, \\ \Rightarrow d > \frac{2 log 2 (log (N / s) + 1)}{τ (1 - ϵ) log (s / 2)} & (24) \end{array}

\begin{array}{l} \Leftrightarrow d \geq \frac{c_{d} log (N / s)}{ϵ log s}, \exists c_{d} > 0 . & (25) \end{array}

Recall τ = ηβ⁻¹(β−1) and β is a function of ϵ, with β(ϵ) ≈ 1+ϵ. Therefore, τ is a function of ϵ, and $τ (ϵ) = O (ϵ)$ , hence there exists a c_d > 0 for (25) to hold.

With regards to the complexity of n, we go back to the right hand side (RHS) of (24) and we substitute $\frac{C_{d} log (N / s)}{ϵ log (s / 2)}$ with C_d > 0 for d in the RHS of (24) to get the following.

\begin{array}{l} - log (\frac{s}{N}) + 1 - \frac{C_{d} τ (1 - ϵ) log (N / s)}{2 ϵ log 2} < 0 . & (26) \end{array}

Now we assume $n = \frac{C_{n} s log (N / s)}{ϵ^{2}}$ with C_n > 0 for n and substitute this in (26) to get the following.

\begin{array}{l} - log (\frac{s}{N}) + 1 - \frac{C_{d} τ (1 - ϵ) ϵ n}{2 C_{n} s log 2} < 0, \\ \Rightarrow n > \frac{2 C_{n} s log 2 (log (N / s) + 1)}{C_{d} τ ϵ (1 - ϵ)} & (27) \end{array}

\begin{array}{l} \Leftrightarrow n \geq \frac{c_{n} s log (N / s)}{ϵ^{2}}, \exists c_{n} > 0 . & (28) \end{array}

Again since $τ (ϵ) = O (ϵ)$ , hence there exists a c_n > 0 for (28) to hold. The bound of n in (28) agrees with our earlier assumption, thus concluding the proof.□

5. Discussion

5.1. Comparison to Other Constructions

In addition to being the first probabilistic construction of adjacency matrices of expander graphs with such a small degree, quantitatively our results compares favorably to existing probabilistic constructions.We use the standard tool of phase transitions to compare our construction to the construction proposed in Berinde [27] and those proposed in Buhrmanet al. [26]. The phase transition curve ρ_BT(δ; d, ϵ) we derived in Lemma 3.1 is the ρ that solves the following equation.

\begin{array}{l} - ρ log (δ ρ) + ρ - \frac{τ (1 - ϵ) d ρ log (δ ρ)}{2 log 2} - \frac{τ ϵ (1 - ϵ) d}{2 c_{n} log 2} \\ + \frac{τ (1 - ϵ) d ρ}{2} = 0, & (29) \end{array}

where c_n > 0 is as in (28). Equation (29) comes from taking the limit, in the proportional growth asymptotics, of the bound in (18), setting that to zero and simplifying. Similarly, for any $S$ with $| S | \leq s$ , Berinde [27] derived the following bound on the set of neigbors of $S$ , i.e., A_s.

\begin{array}{l} Prob (| A_{s} | \leq (1 - ϵ) d s) < (\binom{N}{s}) (\binom{d s}{ϵ d s}) {(\frac{d s}{n})}^{ϵ d s} . & (30) \end{array}

We then express the bound in (30) as the product of a polynomial term and an exponential term as we did for the right hand side of (16). A bound of the exponent is carefully derived as in the derivations above. We set the limit, in the proportional growth asymptotics, of this bound to zero and simplify to get the following.

\begin{array}{l} (ϵ d - 1) log ρ - log δ + (1 + ϵ d) - ϵ d log (ϵ / d) = 0 . & (31) \end{array}

We refer to the ρ that solves (31) as the phase transition for the construction proposed by Berinde [27] and denote this ρ (the phase transition function) as ρ_BI(δ; d, ϵ). Another probabilistic construction was proposed by Burhman et al. [26]. In conforminty with the notation used in this manuscript their bound is equivalent to the following, also stated in a similar form by Indyk and Razenshteyn [23].

\begin{array}{l} Prob (| A_{s} | \leq (1 - ϵ) d s) < (\begin{matrix} N \\ s \end{matrix}) {(\frac{ν ϵ n}{d s})}^{- ϵ d s}, & (32) \end{array}

where ν > 0. We again express the bound in (32) as the product of a polynomial term and an exponential term. A bound of the exponent is carefully derived as in the derivations above. We set the limit, in the proportional growth asymptotics, of this bound to zero and simplify to get the following.

\begin{array}{l} (ϵ d - 1) log ρ - log δ + 1 - ϵ d log (ν ϵ / d) = 0 . & (33) \end{array}

Similarly, we refer to the ρ that solves (33) as the phase transition for the construction proposed by Burhman et al. [26] and denote this ρ as ρ_BM(δ; d, ϵ). We compute numerical solutions to (29), (31), and (33) to derive the phase transitions, for the existence (probabilistic construction) of expanders, ρ_BT(δ; d, ϵ), ρ_BI(δ; d, ϵ), and ρ_BM(δ; d, ϵ) respectively. These are plotted in the left panel of Figure 1. It is clear that our construction has a much higher phase transition than the others, as expected from our previous results, see Figure 9 in Bah and Tanner [1]. Recall that the phase transition curves in these plots depict construction of adjacency matrices of (s, d, ϵ)-expanders with high probability for ratios of s, n, and N (since ρ: = s/n, and δ: = n/N) below the curve; and the failure to construct adjacency matrices of (s, d, ϵ)-expanders with high probability for ratios of s, n, and N above the curve. Essentially, the larger the area under the curve the better.

FIGURE 1

Figure 1. Plots of phase transitions for probabilistic construction of expanders, with fixed d = 2⁵, ϵ = 1/6, and δ ∈ [10⁻⁶, 1] on a logarithmically spaced grid of 100 points. (Left) A comparison of ρ_BT(δ; d, ϵ), ρ_BI(δ; d, ϵ), and ρ_BM(δ; d, ϵ), where c_n = 2. (Right) A comparison of ρ_BT(δ; d, ϵ) denoted as ρ_{B_T₂}(δ; d, ϵ) to our previous ρ_BT(δ; d, ϵ) denoted as ρ_{B_T₁}(δ; d, ϵ) in Bah and Tanner [1] with different values of N, (i.e., 2¹⁰, 2¹², and 2²⁰) and c_n = 2/3.

Remark 5.1. It is easy to see that ρ_BI(δ; d, ϵ) is a special case of ρ_BM(δ; d, ϵ) since the two phase transitions will coincide, or equivalently (31) and (33) will be the same, when ν = e⁻¹. One could argue that Berinde's derivation in [27] suffers from over counting.

Given that this work is an improvement of our work in Bah and Tanner [1] in terms of simplicity in computing ρ_BT(δ; d, ϵ), for completeness we compare our new phase transition ρ_BT(δ; d, ϵ) denoted as ρ_{B_T₂}(δ; d, ϵ) to our previous ρ_BT(δ; d, ϵ) denoted as ρ_{B_T₁}(δ; d, ϵ) in the right panel of Figure 1. Each computation of ρ_{B_T₁}(δ; d, ϵ) requires the specification of N, which is not needed in the computation of ρ_{B_T₂}(δ; d, ϵ), hence the simplification. However, the simplification led to a lower phase transition as expected, which is confirmed by the plots in the right panel of Figure 1.

Remark 5.2. These simulations also inform us about the size of c_n. See from the plots of ρ_BT(δ; d, ϵ) and ρ_{B_T₂}(δ; d, ϵ) that the smaller the value of c_n the higher the phase transition but since ρ_{B_T₂}(δ; d, ϵ) has to be a lower bound of ρ_{B_T₁}(δ; d, ϵ), for values of c_n much smaller than 2/3, the lower bound will fail to hold. This informed the choice of c_n = 2 in the plot of ρ_BT(δ; d, ϵ) in the left panel of Figure 1.

5.2. Implications for Combinatorial Compressed Sensing

When the sensing matrices are restricted to the sparse binary matrices considered in this manuscript, compressed sensing is usually referred to as combinatorial compressed sensing a term introduced in Berinde et al. [18] and used extensively in Mendoza-Smith and Tanner [14] and Mendoza-Smith et al. [15]. In this setting, compressed sensing is more-or-less equivalent to linear sketching. The implications of our results on combinatorial compressed sensing are two-fold. One is on the ℓ₁-norm RIP, we donate as RIP-1; while the second is in the comparison of performance of recovery algorithms for combinatorial compressed sensing.

5.2.1. RIP-1

As can be seen from (2), the recovery errors in compressed sensing depend on the RIC, i.e., δ_s. The following lemma deduced from Theorem 1 of Berinde et al.[18] shows that a scaled A drawn from 𝔼 have RIP with δ_s = 2ϵ.

Lemma 5.1. Consider $ϵ \in (0, \frac{1}{2})$ and let A be drawn from 𝔼, then Φ = A/d satisfies the following RIP-1 condition

\begin{array}{l} (1 - 2 ϵ) ‖ x ‖_{1} \leq ‖ Φ x ‖_{1} \leq ‖ x ‖_{1}, \forall s - s p a r s e x . & (34) \end{array}

The interested reader is referred to the proof of Theorem 1 in Berinde et al. [18] for the proof of this lemma. Key to the holding of Lemma 5.1 is the existence of (s, d, ϵ)-expander graphs, hence one can draw corollaries from our results on this.

Corollary 5.1. Consider $ϵ \in (0, \frac{1}{2})$ and let d, s, n, N ∈ ℕ. In the proportional growth asymptotics with a random draw of an n × N matrix A from $B$ , the matrix Φ: = A/d has RIP-1 with probability approaching 1 exponentially, if

\begin{array}{l} d = O (\frac{log (N / s)}{ϵ log s}), and n = O (\frac{s log (N / s)}{ϵ^{2}}) . & (35) \end{array}

Proof: Note that the upper bound of (34) holds trivially for any Φ = A/d where A has d ones per column, i.e., $A \in B$ . But for the lower bound of (34) to hold for any Φ = A/d, we need A to be an (s, d, ϵ)-expander matrix, i.e., A ∈ 𝔼. Note that the event |A_s| ≥ (1 − ϵ)ds is equal to the event ||A $S$ x||₁ ≥ (1 − 2ϵ)d||x|₁, which is equivalent to $‖ Φ_{S} x ‖_{1} \geq (1 - 2 ϵ) ‖ x |_{1}$ , for a fixed $S$ , with

$| S | \leq s$ . For A to be in 𝔼, we need expansion for all sets $S$ , with $| S | \leq s$ , i.e., A∈𝔼. The key thing to remember is that

\begin{array}{l} Prob (‖ Φ X ‖_{1} \geq (1 - 2 є) ‖ X |_{1}) \\ = Prob (| A_{s} | \geq (1 - є) d s), for all {S : | S | \leq s} . & (36) \end{array}

The probability in (36) going to 1 exponentially in the proportional growth asymptotics, i.e., the existence of A∈𝔼 with parameters as given in (35), is what is stated in Theorem 3.1. Therefore, the rest of the proof follows from the proof of Theorem 3.1, hence concluding the proof of the corollary□.

Notably, Lemma 5.1 holds with Φ having much smaller number of non-zeros per column due to our construction. More over, we can derive sampling theorems for which Lemma 5.1 holds as thus.

Corollary 5.2. Fix $ϵ \in (0, \frac{1}{2})$ and let d, s, n, N ∈ ℕ. In the proportional growth asymptotics, for any ρ < (1−γ)ρ_BT(δ; d, ϵ) and γ > 0, a random draw of A from $B$ implies Φ: = A/d has RIP-1 with probability approaching 1 exponentially.

Proof. The proof of this corollary follows from the proof of Corollary 5.1 above, and it is related to the proof of Lemma 3.1 as the proof of Corollary 5.1 is to the proof of Theorem 3.1. The details of the proof is thus skipped.□

5.2.2. Performance of Algorithms

We wish to compare the performance of selected combinatorial compressed sensing algorithms in terms of the possible problem sizes (s, n, N) that these algorithms can reconstruct sparse/compressible signals/vectors up to their respective error guarantees. The comparison is typically done in the framework of phase transitions, which depict a boundary curve where ratios of problems sizes above this curve are recovered with probability approaching 0 exponentially; while problems sizes below the curve are recovered with probability approaching 1 exponentially. The list of combinatorial compressed sensing algorithms includes Expander Matching Pursuit (EMP) [32], Sparse Matching Pursuit [33], Sequential Sparse Matching Pursuit (SSMP) [13], Left Degree Dependent Signal Recovery (LDDSR) [11], Expander Recovery (ER) [12], Expander Iterative Hard-Thresholding (EIHT) [19, Section 13.4], and Expander ℓ₀-decoding (ELD) with both serial and parallel versions [14]. For reason similar to those used in Bah and Tanner [1] and Mendoza-Smith and Tanner [14], we selected out of this list four of the algorithms: (i) SSMP, (ii) ER, (ii) EIHT, (iv) ELD. Descriptions of these algorithms is skipped here but the interested reader is referred to the original papers or their summarized details in Bah and Tanner [1] and Mendoza-Smith and Tanner [14]. We were also curious as to how ℓ₁-minimization's performance compares to these selected combinatorial compressed sensing algorithms, since ℓ₁-minimization (ℓ₁-min) can be used to solve the combinatorial problem solved by these algorithms, see [18, Theorem 3].

The phase transitions are based on conditions on the RIC of the sensing matrices used. Consequent to Lemma 5.1, this becomes conditions on the expansion coefficient (i.e., ϵ) of the underlying (s, d, ϵ)-expander graphs of the sparse sensing matrices used. Where this condition on ϵ is not explicitly given it is easily deducible from the recovery guarantees given for each algorithms. The conditions are summarized in the table below.

The theoretical values are what will be found in the reference given in the table; while the computational values are what we used in our numerical experiments to compute the phase transition curves of the algorithms. The value for e was set to be 10⁻¹⁵, to make the ϵ_k as large as possible under the given condition. With these values we computed the phase transitions in Figure 2.

FIGURE 2

Figure 2. Phase transitions plots of algorithms with fixed d = 2⁵, ϵ as in the fourth column of Table 1 with e = 10⁻¹⁵, c_n = 2 and δ ∈ [10⁻⁶, 1] on a logarithmically spaced grid of 100 points. (Left) k = 3s for ρ_SSMP(δ; d, ϵ). (Right) k = ⌈(2+e)s⌉ for ρ_SSMP(δ; d, ϵ).

TABLE 1

Table 1. Recovery conditions on ϵ.

The two figures are the same except for the different sparsity value used. The performance of the algorithms in this framework are thus ranked as follows: ELD, ER, ℓ₁-min, EIHT, and SSMP.

Remark 5.3. We point out that there are many way to compare performance of algorithms, this is just one way. For instance, we can compare runtime complexities or actual computational runtimes as in Mendoza-Smith and Tanner [14]; phase transitions of different probabilities, here the probability of recovery is 1 but this could be set to something else, like 1/2 in the simulations in Mendoza-Smith and Tanner [14]; one could also compare number of iterations and iteration cost as was also done in Mendoza-Smith and Tanner [14]. We further point out that such comaprisons could be conducted with real-world data simulations, as was done in Mendoza-Smith and Tanner [14].

6. Proofs of Theorems, Lemmas, and Propositions

6.1. Theorem 3.1

The proof of the theorem follows trivially from Corollary 4.1. Based on (21) of Corollary 4.1 we deduce that A ∈ 𝔼 with probability approaching 1 exponentially with $d \geq \frac{c_{d} log (N / s)}{ϵ log s}$ and $n \geq \frac{c_{n} s log (N / s)}{ϵ^{2}}$ , with c_d, c_n > 0, hence concluding the proof.□

6.2. Lemma 3.1

The phase transition curve ρ_BT(δ; d, ϵ) is based the bound of the exponent of (16), which is

\begin{array}{l} - \frac{s}{N} log (\frac{s}{N}) + \frac{s}{N} - \frac{τ (1 - ϵ) d}{2 log 2} \frac{s}{N} log (\frac{s}{2}) + o (N) . & (37) \end{array}

In the propotional growth asymptotics (s, n, N) → ∞ while s/n → ρ ∈ (0, 1) and n/N → δ ∈ (0, 1). This implies that o(N) → 0 and (37) becomes

\begin{array}{l} - ρ log (δ ρ) + ρ - \frac{τ (1 - ϵ) d ρ log (δ ρ)}{2 log 2} \\ - \frac{τ ϵ (1 - ϵ) d}{2 c_{n} log 2} + \frac{τ (1 - ϵ) d ρ}{2}, & (38) \end{array}

where c_n > 0 is as in (28). If (38) is negative then as the problem size grows we have

\begin{array}{l} Prob (| A_{s} | \leq (1 - ϵ) d s) \to 0 . & (39) \end{array}

Therefore, setting (38) to zero and solving for ρ gives us a critical ρ below which (38) is negative and positive above it. The critical ρ is the phase transition ρ, i.e., ρ_BT(δ; d, ϵ), where below ρ_BT(δ; d, ϵ) is parameterized by the γ in the lemma. This concludes the proof.□

6.3. Lemma 4.1

By the dyadic splitting proposed in Bah and Tanner [1], we let $A_{s} = A_{⌈ \frac{s}{2} ⌉}^{1} \cup A_{⌊ \frac{s}{2} ⌋}^{2}$ such that $| A_{s} | = | A_{⌈ \frac{s}{2} ⌉}^{1} \cup A_{⌊ \frac{s}{2} ⌋}^{2} |$ and therefore

\begin{array}{l} Prob (| A_{s} | \leq a_{s}) = Prob (| A_{⌈ \frac{s}{2} ⌉}^{1} \cup A_{⌊ \frac{s}{2} ⌋}^{2} | \leq a_{s}) & (40) \end{array}

\begin{array}{l} = \sum_{l_{s} = d}^{a_{s}} Prob (| A_{⌈ \frac{s}{2} ⌉}^{1} \cup A_{⌊ \frac{s}{2} ⌋}^{2} | = l_{s}) & (41) \end{array}

In (41) we sum over all possible events, i.e., all possible sizes of l_s. In line with the splitting technique, we simplify the probability to the product of the probabilities of the cardinalities of $| A_{⌈ \frac{s}{2} ⌉}^{1} |$ and $| A_{⌊ \frac{s}{2} ⌋}^{2} |$ and their intersection. Using the definition of P_n(·) in Lemma 8.2 (Appendix), thus leads to the following.

\begin{array}{l} Prob (| A_{s} | \leq a_{s}) \\ = \sum_{l_{s} = 2 d}^{a_{s}} \sum_{l_{⌈ \frac{s}{2} ⌉}^{1} = d}^{a_{⌈ \frac{s}{2} ⌉}} \sum_{l_{⌊ \frac{s}{2} ⌋}^{2} = d}^{a_{⌊ \frac{s}{2} ⌋}} P_{n} (l_{s}, l_{⌈ \frac{s}{2} ⌉}^{1}, l_{⌊ \frac{s}{2} ⌋}^{2}) \\ \times Prob (| A_{⌈ \frac{s}{2} ⌉}^{1} | = l_{⌈ \frac{s}{2} ⌉}^{1}) Prob (| A_{⌊ \frac{s}{2} ⌋}^{2} | = l_{⌊ \frac{s}{2} ⌋}^{2}) . & (42) \end{array}

In a slight abuse of notation we write $\sum_{\underset{_{j \in [m]}}{_{l^{j}}}}$ to denote applying the sum m times. We also drop the limits of the summation indices henceforth. Now we use Lemma 8.1 in Appendix to simplify (42) as follows.

\begin{array}{l} \underset{_{j_{1} \in [q_{0}]}}{\sum_{l_{Q_{0}}^{j_{1}}}} \underset{_{j_{2} \in [q_{1}]}}{\sum_{l_{Q_{1}}^{j_{2}}}} \underset{_{j_{3} \in [r_{1}]}}{\sum_{l_{R_{1}}^{j_{3}}}} P_{n} (l_{Q_{0}}^{j_{1}}, l_{⌈ \frac{Q_{0}}{2} ⌉}^{2 j_{1} - 1}, l_{⌊ \frac{Q_{0}}{2} ⌋}^{2 j_{1}}) \times \\ \begin{matrix} {\prod^{​}}_{j_{2} = 1}^{q_{1}} Prob (| A_{Q_{1}}^{j_{2}} | = l_{Q_{1}}^{j_{2}}) {\prod^{​}}_{q_{1} + r_{1}}^{j_{3} = q_{1} + 1} Prob (| A_{R_{1}}^{j_{3}} | = l_{R_{1}}^{j_{3}}) \end{matrix} & (43) \end{array}

Now we proceed with the splitting—note (43) stopped only at the first level. At the next level, the second, we will have q₂ sets with Q₂ columns and r₂ sets with R₂ columns which leads to the following expression.

\begin{array}{l} \begin{array}{l} \underset{_{j_{1} \in [q_{0}]}}{\sum_{l_{Q_{0}}^{j_{1}}}} \underset{_{j_{2} \in [q_{1}]}}{\sum_{l_{Q_{1}}^{j_{2}}}} \underset{_{j_{3} \in [r_{1}]}}{\sum_{l_{R_{1}}^{j_{3}}}} P_{n} (l_{Q_{0}}^{j_{1}}, l_{⌈ \frac{Q_{0}}{2} ⌉}^{2 j_{1} - 1}, l_{⌊ \frac{Q_{0}}{2} ⌋}^{2 j_{1}}) \\ [\underset{_{j_{4} \in [q_{2}]}}{\sum_{l_{Q_{2}}^{j_{4}}}} \underset{_{j_{5} \in [r_{2}]}}{\sum_{l_{R_{2}}^{j_{5}}}} P_{n} (l_{Q_{1}}^{j_{2}}, l_{⌈ \frac{Q_{1}}{2} ⌉}^{2 j_{2} - 1}, l_{⌊ \frac{Q_{1}}{2} ⌋}^{2 j_{2}}) \times P_{n} (l_{R_{1}}^{j_{3}}, l_{⌈ \frac{R_{1}}{2} ⌉}^{2 j_{3} - 1}, l_{⌊ \frac{R_{1}}{2} ⌋}^{2 j_{3}}) \end{array} \\ \begin{matrix} \begin{array}{l} {\prod^{​}}^{​}_{j_{4} = 1}^{q_{2}} Prob (| A_{Q_{2}}^{j_{4}} | = l_{Q_{2}}^{j_{4}}) {\prod^{​}}^{​}_{j_{5} = q_{2} + 1}^{q_{2} + r_{2}} Prob (| A_{R_{2}}^{j_{5}} | = l_{R_{2}}^{j_{5}})] . \end{array} \end{matrix} & (44) \end{array}

We continue this splitting of each instance of Prob(·) for ⌈log₂s⌉−1 levels until reaching sets with single columns where, by construction, the probability that the single column has d non-zeros is one. Note that at this point we drop the subscripts j_i, as they are no longer needed. This process gives a complicated product of nested sums of P_n(·) which we express as

\begin{array}{l} \sum_{l_{Q_{0}}} \sum_{l_{Q_{1}}} \sum_{l_{R_{1}}} P_{n} (l_{Q_{0}}, l_{⌈ \frac{Q_{0}}{2} ⌉}, l_{⌊ \frac{Q_{0}}{2} ⌋}) \\ [\sum_{l_{Q_{2}}} \sum_{l_{R_{2}}} P_{n} (l_{Q_{1}}, l_{⌈ \frac{Q_{1}}{2} ⌉}, l_{⌊ \frac{Q_{1}}{2} ⌋}) P_{n} (l_{R_{1}}, l_{⌈ \frac{R_{1}}{2} ⌉}, l_{⌊ \frac{R_{1}}{2} ⌋}) \\ [\times \dots [\sum_{l_{Q_{⌈ {log}_{2} s ⌉ - 1}}} P_{n} (l_{4}, l_{2}, l_{2}) P_{n} (l_{3}, l_{2}, d) P_{n} (l_{2}, d, d)] \dots] . & (45) \end{array}

Using the expression for P_n(·) in (133) of Lemma 8.2 (Appendix) we bound (45) by bounding each P_n(·) as in (134) with a product of a polynomial, π(·), and an exponential with exponent ψ_n(·).

\begin{array}{l} \sum_{l_{Q_{0}}} \sum_{l_{Q_{1}}} \sum_{l_{R_{1}}} π (l_{Q_{0}}, l_{⌈ \frac{Q_{0}}{2} ⌉}, l_{⌊ \frac{Q_{0}}{2} ⌋}) e^{ψ_{n} (l_{Q_{0}}, l_{⌈ \frac{Q_{0}}{2} ⌉}, l_{⌊ \frac{Q_{0}}{2} ⌋})} \times \\ [\sum_{l_{Q_{2}}} \sum_{l_{R_{2}}} π (l_{Q_{1}}, l_{⌈ \frac{Q_{1}}{2} ⌉}, l_{⌊ \frac{Q_{1}}{2} ⌋}) \times \\ e^{ψ_{n} (l_{Q_{1}}, l_{⌈ \frac{Q_{1}}{2} ⌉}, l_{⌊ \frac{Q_{1}}{2} ⌋})} π (l_{R_{1}}, l_{⌈ \frac{R_{1}}{2} ⌉}, l_{⌊ \frac{R_{1}}{2} ⌋}) \cdot e^{ψ_{n} (l_{R_{1}}, l_{⌈ \frac{R_{1}}{2} ⌉}, l_{⌊ \frac{R_{1}}{2} ⌋})} \times \\ [\dots \times [\sum_{l_{Q_{⌈ {log}_{2} s ⌉ - 1}}} π (l_{4}, l_{2}, l_{2}) \times e^{ψ_{n} (l_{4}, l_{2}, l_{2})} π (l_{3}, l_{2}, d) \times \\ e^{ψ_{n} (l_{3}, l_{2}, d)} π (l_{2}, d, d) \cdot e^{ψ_{n} (l_{2}, d, d)}] \dots] . & (46) \end{array}

Using Lemma 8.4 in Appendix we maximize the ψ_n(·) and hence the exponentials in (46). We maximize each ψ_n(·) by choosing l_(·) to be a_(·). Then (46) will be upper bounded by the following.

\begin{array}{l} \sum_{l_{Q_{0}}} \sum_{l_{Q_{1}}} \sum_{l_{R_{1}}} π (l_{Q_{0}}, l_{⌈ \frac{Q_{0}}{2} ⌉}, l_{⌊ \frac{Q_{0}}{2} ⌋}) e^{ψ_{n} (a_{Q_{0}}, a_{⌈ \frac{Q_{0}}{2} ⌉}, a_{⌊ \frac{Q_{0}}{2} ⌋})} \times \\ [\sum_{l_{Q_{2}}} \sum_{l_{R_{2}}} π (l_{Q_{1}}, l_{⌈ \frac{Q_{1}}{2} ⌉}, l_{⌊ \frac{Q_{1}}{2} ⌋}) \times e^{ψ_{n} (a_{Q_{1}}, a_{⌈ \frac{Q_{1}}{2} ⌉}, a_{⌊ \frac{Q_{1}}{2} ⌋})} \\ π (l_{R_{1}}, l_{⌈ \frac{R_{1}}{2} ⌉}, l_{⌊ \frac{R_{1}}{2} ⌋}) \cdot e^{ψ_{n} (a_{R_{1}}, a_{⌈ \frac{R_{1}}{2} ⌉}, a_{⌊ \frac{R_{1}}{2} ⌋})} [\dots \times \\ [\sum_{l_{Q_{⌈ {log}_{2} s ⌉ - 1}}} π (l_{4}, l_{2}, l_{2}) \times e^{ψ_{n} (a_{4}, a_{2}, a_{2})} π (l_{3}, l_{2}, d) e^{ψ_{n} (a_{3}, a_{2}, d)} \\ π (l_{2}, d, d) \cdot e^{ψ_{n} (a_{2}, d, d)}] \dots] . & (47) \end{array}

We then factor the product of exponentials. This product becomes an exponential where exponent is the summation of the ψ_n(·), we will denote this exponent as ${\tilde{Ψ}}_{n} (a_{s}, \dots, a_{2}, d)$ . Then (47) simplifies to the following.

\begin{array}{l} e^{{\tilde{Ψ}}_{n} (a_{s}, \dots, a_{2}, d)} \cdot \sum_{l_{Q_{0}}} \sum_{l_{Q_{1}}} \sum_{l_{R_{1}}} π (l_{Q_{0}}, l_{⌈ \frac{Q_{0}}{2} ⌉}, l_{⌊ \frac{Q_{0}}{2} ⌋}) \times \\ [\sum_{l_{Q_{2}}} \sum_{l_{R_{2}}} π (l_{Q_{1}}, l_{⌈ \frac{Q_{1}}{2} ⌉}, l_{⌊ \frac{Q_{1}}{2} ⌋}) \times π (l_{R_{1}}, l_{⌈ \frac{R_{1}}{2} ⌉}, l_{⌊ \frac{R_{1}}{2} ⌋}) \times [ \\ \dots \times [\sum_{l_{Q_{⌈ {log}_{2} s ⌉ - 1}}} π (l_{4}, l_{2}, l_{2}) π (l_{3}, l_{2}, d) π (l_{2}, d, d)] \dots] & (48) \end{array}

We denote the factor multiplying the exponential term by Π(l_s, …, l₂, d), therefore we have the following bound.

\begin{array}{l} Prob (| A_{s} | \leq a_{s}) \leq Π (l_{s}, \dots, l_{2}, d) \cdot e^{({\tilde{Ψ}}_{n} (a_{s}, \dots, a_{2}, d))}, & (49) \end{array}

where ${\tilde{Ψ}}_{n} (a_{s}, \dots, a_{2}, d)$ is exactly Ψ_n(a_s, …, a₂, d) given by (78) in [1, Proof of Theorem 1.6]. Consequently, we state the bound of ${\tilde{Ψ}}_{n} (a_{s}, \dots, a_{2}, d)$ and skip the proof, which is as thus.

\begin{array}{l} {\tilde{Ψ}}_{n} (a_{s}, \dots, a_{2}, d) \leq \sum_{i \in Ω} \frac{s}{2 i} ψ_{i}, for Ω = {2^{j}}_{j = 0}^{{log}_{2} (s) - 1}, & (50) \end{array}

where ψ_i is given by (8). The upper bound of Π(l_s, …, l₂, d) is given by the following proposition.

Proposition 6.1. Given l_s ≤ a_s, l_⌈s/2⌉ ≤ a_⌈s/2⌉, ⋯ , l₂ ≤ a₂, we have

\begin{array}{l} Π (l_{s}, \dots, l_{2}, d) \leq 2^{- 3} \cdot e^{\frac{1}{4}} \cdot s^{\frac{3}{2} {log}_{2} s + \frac{9}{2}} \cdot {(a_{s})}^{{log}_{2} s - \frac{3}{2}} . & (51) \end{array}

The proof of the proposition is found in section 6.4. Taking log of right hand side of (51) and then exponentiating the results yields

\begin{array}{l} Π (l_{s}, \dots, l_{2}, d) \\ \leq \exp [\frac{1}{4} - 3 log 2 + (\frac{9}{2} + \frac{3}{2} {log}_{2} s) log s \\ + ({log}_{2} s - \frac{3}{2}) log a_{s}] & (52) \end{array}

\begin{array}{l} = 2^{- 3} s^{9 / 2} e^{1 / 4} \cdot e^{[\frac{3}{2} {log}_{2} s log s + ({log}_{2} s - \frac{3}{2}) log a_{s}]} & (53) \end{array}

\begin{array}{l} = 2^{- 3} s^{9 / 2} e^{1 / 4} \cdot e^{[\frac{3 log 2}{2} {log}_{2}^{2} s + ({log}_{2} s - \frac{3}{2}) log a_{s}]} . & (54) \end{array}

Combining (54) and (50) gives the following bound for (49)

\begin{array}{l} Prob (| A_{s} | \leq a_{s}) \leq 2^{- 3} s^{9 / 2} e^{1 / 4} \cdot \exp [\frac{\log 8}{2} \log_{2}^{2} s \\ + (\log_{2} s - \frac{3}{2}) \log a_{s} + {\tilde{Ψ}}_{n} (a_{s}, \dots, d)] & (55) \end{array}

It follows therefore that $p_{n} (s, d) = 2^{- 3} s^{9 / 2} e^{1 / 4}$ as in (11) and the exponent in (55) is n · Ψ_n(a_s, …, a₁), which implies (12). This concludes the proof of the lemma.□

6.4. Proposition 6.1

By definition, from (48), we have

\begin{array}{l} \prod (l_{s}, \dots, l_{2}, d) : = \sum_{l_{Q_{0}}} \sum_{l_{Q_{1}}} \sum_{l_{R_{1}}} π (l_{Q_{0}}, l_{⌈ \frac{Q_{0}}{2} ⌉}, l_{⌊ \frac{Q_{0}}{2} ⌋}) \times \\ [\sum_{l_{Q_{2}}} \sum_{l_{R_{2}}} π (l_{Q_{1}}, l_{⌈ \frac{Q_{1}}{2} ⌉}, l_{⌊ \frac{Q_{1}}{2} ⌋}) \times π (l_{R_{1}}, l_{⌈ \frac{R_{1}}{2} ⌉}, l_{⌊ \frac{R_{1}}{2} ⌋}) \cdot [\dots \\ \times [\sum_{l_{Q_{⌈ \log_{2} s ⌉ - 1}}} π (l_{4}, l_{2}, l_{2}) π (l_{3}, l_{2}, d) π (l_{2}, d, d)] \dots] . & (56) \end{array}

From (138) we see that π(·) is maximized when all the three arguments are the same and using Corollary 8.1 we take largest possible arguments that are equal in the range of the summation. Before we write out the resulting bound for Π(l_s, …, l₂, d), we simplify notation by denoting π(x, x, x) as π(x), and noting that Q_{⌈_log₂s⌉−1} = 2. Therefore, the bound becomes the following.

\begin{array}{l} Π (l_{s}, \dots, l_{2}, d) \leq \underset{l_{Q_{0}}}{\sum^{}} \underset{l_{Q_{1}}}{\sum^{}} \underset{l_{R_{1}}}{\sum^{}} π (l_{Q_{0}}) [\underset{l_{Q_{2}}}{\sum^{}} \underset{l_{R_{2}}}{\sum^{}} π (l_{Q_{1}}) \\ \begin{array}{l} \times π (l_{R_{1}}) [\dots [\underset{l_{2}}{\sum^{}} π (l_{4}) π (l_{3}) π (l_{2})] \dots] . \end{array} & (57) \end{array}

Properly aligning the π(·) with their relevant summations simplifies the right hand side (RHS) of (57) to the following.

\begin{array}{l} \begin{array}{l} \begin{array}{l} [\underset{l_{Q_{0}}}{\sum^{}} π (l_{Q_{0}})] \cdot [\underset{l_{Q_{1}}}{\sum^{}} π (l_{Q_{1}})] \cdot [\underset{l_{R_{1}}}{\sum^{}} π (l_{R_{1}})] \times \dots \\ \times [\underset{l_{4}}{\sum^{}} π (l_{4})] \cdot [\underset{l_{3}}{\sum^{}} π (l_{3})] \cdot [\underset{l_{2}}{\sum^{}} π (l_{2})] . \end{array} \end{array} & (58) \end{array}

From (138) we have π (y) defined as

\begin{array}{l} π (y, y, y) = {(\frac{5}{4})}^{2} \sqrt{\frac{2 π y (n - y)}{n}} < {(\frac{5}{4})}^{2} \sqrt{2 π y} . & (59) \end{array}

We use the RHS of (59) to upper bound each term in (58), leading to the following bound.

\begin{array}{l} [\sqrt{2 π} {(\frac{5}{4})}^{2} \underset{l_{Q_{1}}}{\sum^{}} \sqrt{l_{Q_{0}}}] \cdot [\sqrt{2 π} {(\frac{5}{4})}^{2} \underset{l_{Q_{1}}}{\sum^{}} \sqrt{l_{Q_{1}}}] . \\ [\sqrt{2 π} {(\frac{5}{4})}^{2} \underset{l_{R_{1}}}{\sum^{}} \sqrt{l_{R_{1}}}] \times \dots \times [\sqrt{2 π} {(\frac{5}{4})}^{2} \underset{l_{4}}{\sum^{}} \sqrt{l_{4}}] \times \\ [\sqrt{2 π} {(\frac{5}{4})}^{2} \underset{l_{3}}{\sum^{}} \sqrt{l_{3}}] \cdot [\sqrt{2 π} {(\frac{5}{4})}^{2} \underset{l_{2}}{\sum^{}} \sqrt{l_{2}}] . & (60) \end{array}

For each Q_i and R_i, i = 1, …, ⌈log₂ s⌉−2, which means we have ⌈log₂ s⌉ − 2 pairs plus one Q₀, hence (60) simplifies to the following.

\begin{array}{l} {[\sqrt{2 π} {(\frac{5}{4})}^{2}]}^{2 ⌈ {log}_{2} s ⌉ - 3} [\sum_{l_{Q_{0}}} \sqrt{l_{Q_{0}}}] \cdot [\sum_{l_{Q_{1}}} \sqrt{l_{Q_{1}}}] \times \dots \\ \times [\sum_{l_{3}} \sqrt{l_{3}}] \cdot [\sum_{l_{2}} \sqrt{l_{2}}] & (61) \end{array}

\begin{array}{l} \leq {[\sqrt{2 π} {(\frac{5}{4})}^{2}]}^{2 ⌈ {log}_{2} s ⌉ - 3} \cdot (q_{0} \sqrt{a_{Q_{0}}}) \cdot (q_{1} \sqrt{a_{Q_{1}}}) \dots \\ \times (q_{⌈ {log}_{2} s ⌉ - 2} \sqrt{a_{3}}) \cdot (r_{⌈ {log}_{2} s ⌉ - 2} \sqrt{a_{2}}) & (62) \end{array}

\begin{array}{l} = {[\sqrt{2 π} {(\frac{5}{4})}^{2}]}^{2 ⌈ {log}_{2} s ⌉ - 3} \cdot (q_{0} q_{1} r_{1} \dots q_{⌈ {log}_{2} s ⌉ - 2} r_{⌈ {log}_{2} s ⌉ - 2}) \cdot \\ {(a_{Q_{0}} a_{Q_{1}} a_{R_{1}} \dots a_{3} a_{2})}^{1 / 2} . & (63) \end{array}

From (61) to (62) we upper each sum by taking the largest possible value of l_(·), which is a_(·), and multiplied it with the total number terms in the summation given by Lemma 8.1 in Appendix. We did upper bound the following two terms of (63).

\begin{array}{l} q_{0} q_{1} r_{1} q_{2} r_{2} q_{3} r_{3} \dots q_{⌈ {log}_{2} s ⌉ - 2} r_{⌈ {log}_{2} s ⌉ - 2} \leq s^{{log}_{2} s - 1}, & (64) \end{array}

\begin{array}{l} {(a_{Q_{0}} a_{Q_{1}} a_{R_{1}} a_{Q_{2}} a_{R_{2}} a_{Q_{3}} a_{R_{3}} \dots a_{3} a_{2})}^{1 / 2} \leq 2^{- 1} \cdot e^{\frac{1}{4}} \cdot {(a_{s})}^{{log}_{2} s - \frac{3}{2}} \times \\ s^{\frac{1}{2} {log}_{2} s + \frac{3}{2}} & (65) \end{array}

Details of the derivation of the bounds (64) and (65) is in the Appendix. Using these bounds from (63) we have the following upper bound for Π(l_s, …, l₂, d).

\begin{array}{l} Π (l_{s}, \dots, l_{2}, d) \leq {[\sqrt{2 π} {(\frac{5}{4})}^{2}]}^{2 ⌈ {log}_{2} s ⌉ - 3} \cdot (s^{{log}_{2} s - 1}) \times \\ (2^{- 1} \cdot e^{\frac{1}{4}} \cdot {(a_{s})}^{{log}_{2} s - \frac{3}{2}} \cdot s^{\frac{1}{2} {log}_{2} s + \frac{3}{2}}) & (66) \end{array}

\begin{array}{l} \leq 2^{- 3} \cdot e^{\frac{1}{4}} \cdot {(a_{s})}^{{log}_{2} s - \frac{3}{2}} \cdot s^{\frac{3}{2} {log}_{2} s + \frac{9}{2}} . & (67) \end{array}

From (66) to (67) we used the following upper bound.

\begin{array}{l} {[\sqrt{2 π} {(\frac{5}{4})}^{2}]}^{2 ⌈ {log}_{2} s ⌉ - 3} \leq {[\sqrt{2 π} {(\frac{5}{4})}^{2}]}^{2 ({log}_{2} s + 1) - 3} \\ \leq 4^{2 {log}_{2} s - 1} = 2^{4 {log}_{2} s - 2} = 2^{- 2} s^{4} . & (68) \end{array}

The bound (67) coincides with (51), hence concluding the proof.□

6.5. Theorem 4.2

The following lemma is a key input in this proof.

Lemma 6.1. Let 0 < α ≤ 1, and ε_n > 0 such that ε_n → 0 as n → ∞. Then for a_s < â_s,

\begin{array}{l} a_{2 i} = 2 a_{i} + c a_{i}^{2}, for c = - β n^{- 1}, & (69) \end{array}

where

\begin{array}{l} β = \frac{1 + \sqrt{1 - 4 {(1 - ε_{n})}^{2} (1 - e^{- α d}) e^{- α d}}}{2 (1 - ε_{n}) (1 - e^{- α d})} . & (70) \end{array}

The proof of the lemma is found in section 6.6. Recall from Theorem 4.1 that

\begin{array}{l} Ψ_{n} (a_{s}, \dots, a_{1}) = \frac{1}{n} [\frac{3 \log 2}{2} \log_{2}^{2} s + (\log_{2} s - \frac{3}{2}) \log a_{s} + \\ \sum_{i \in Ω} \frac{s}{2 i} ψ_{i}], f o r Ω = {2^{j}}_{j = 0}^{\log_{2} (s) - 1}, & (71) \end{array}

where ψ_i is given by the following expression

\begin{array}{l} (n - a_{i}) \cdot H (\frac{a_{2 i} - a_{i}}{n - a_{i}}) + a_{i} \cdot H (\frac{a_{2 i} - a_{i}}{a_{i}}) - n \cdot H (\frac{a_{i}}{n}) & (72) \end{array}

We use Lemma 6.1 to upper bound ψ_i in (72) away from zero from above as n → 0. We formalize this bound in the following proposition.

Proposition 6.2. Let η > 0 and β > 1 as defined in Lemma 6.1. Then

\begin{array}{l} ψ_{i} \leq - a_{i} η (β - 1) β^{- 1} {(1 - β \frac{a_{i}}{n})}^{- 1} . & (73) \end{array}

The proof of Proposition 6.2 is found in section 6.7. Using the bound of ψ_i in Proposition 6.2, we upper Ψ(a_s, …, a₁) as follows.

\begin{array}{l} Ψ (a_{s}, \dots, a_{1}) \leq \frac{1}{n} [\frac{3 \log 2}{2} \log_{2}^{2} s + (\log_{2} s - \frac{3}{2}) \log a_{s} \\ - \sum_{i \in Ω} \frac{s}{2 i} \cdot \frac{a_{i} η (β - 1)}{β (1 - β \frac{a_{i}}{n})}], & (74) \end{array}

\begin{array}{l} \leq - \frac{η (β - 1)}{β} \sum_{i \in Ω} (\frac{s}{2 i} \cdot \frac{a_{i}}{n}) + \frac{1}{n} [\frac{3 \log 2}{2} \log_{2}^{2} s + \\ (\log_{2} s - \frac{3}{2}) \log a_{s}] . & (75) \end{array}

Then setting a_s = (1 − ϵ)ds and substituting in (75), the factor multiplying $\frac{1}{n}$ becomes

\begin{array}{l} \frac{3 log 2}{2} {log}_{2}^{2} s + ({log}_{2} s - \frac{3}{2}) log [(1 - ϵ) d s] & (76) \end{array}

\begin{array}{l} = \frac{3 log 2}{2} {log}_{2}^{2} s + log (1 - ϵ) {log}_{2} s + log d {log}_{2} s + \\ {log}_{2} s log s - \frac{3}{2} log [(1 - ϵ) d] - \frac{3}{2} log s & (77) \end{array}

\begin{array}{l} = \frac{5 log 2}{2} {log}_{2}^{2} s + ({log}_{2} (1 - ϵ) - 3 / 2) log s + log d {log}_{2} s + \\ log [{(1 - ϵ)}^{- 3 / 2} d^{- 3 / 2}] & (78) \end{array}

\begin{array}{l} = \frac{5 log 2}{2} {log}_{2}^{2} s + log d {log}_{2} s + log [{(1 - ϵ)}^{- 3 / 2} d^{- 3 / 2}] \\ + log s^{{log}_{2} (1 - ϵ) - 3 / 2} . & (79) \end{array}

The last two terms of (79) become polynomial in s, d and ϵ, when exponentiated hence they are incorporated into p_n(s, d, ϵ) in (14), which means

\begin{array}{l} p_{n} (s, d, ϵ) = p_{n} (s, d) \cdot \exp [\log [{(1 - ϵ)}^{- 3 / 2} d^{- 3 / 2}] + \\ \log s^{\log_{2} (1 - ϵ) - 3 / 2}] & (80) \end{array}

\begin{array}{l} = 2^{- 3} s^{9 / 2} e^{1 / 4} \cdot {(1 - ϵ)}^{- 3 / 2} d^{- 3 / 2} s^{{log}_{2} (1 - ϵ) - 3 / 2} & (81) \end{array}

\begin{array}{l} = \frac{\sqrt[4]{e} \cdot s^{{log}_{2} (1 - ϵ) + 3}}{\sqrt{2^{6} {(1 - ϵ)}^{3} d^{3}}}, & (82) \end{array}

which is (14). The first two terms of (79) will grow faster than a polynomial in s, d and ϵ when exponentiated, hence they replace in (75), the factor multiplying $\frac{1}{n}$ . Therefore, (79) is modified as thus

\begin{array}{l} - \frac{η (β - 1)}{β} \sum_{i \in Ω} (\frac{s}{2 i} \cdot \frac{a_{i}}{n}) + \frac{1}{n} [\frac{5 log 2}{2} {log}_{2}^{2} s + log d {log}_{2} s] \\ = : Ψ_{n} (s, d, ϵ) . & (83) \end{array}

The factor $\sum_{i \in Ω} (\frac{s}{2 i} \cdot \frac{a_{i}}{n})$ in (83) is lower bounded as follows, see proof in section 6.8.

\begin{array}{l} \sum_{i \in Ω} (\frac{s}{2 i} \cdot \frac{a_{i}}{n}) \geq \frac{{log}_{2} (s / 2)}{2 n} (1 - ϵ) d s . & (84) \end{array}

Using this bound in (83) gives (15), thus concluding the proof.□

6.6. Lemma 6.1

Recall that we have a formula for the expected values of the a_i as

\begin{array}{l} {\hat{a}}_{2 i} = {\hat{a}}_{i} (2 - \frac{{\hat{a}}_{i}}{n}) for i \in {2^{j}}_{j = 0}^{{log}_{2} (s) - 1}, & (85) \end{array}

which follow a relatively simple formulas, and then the coupled system of cubics as

\begin{array}{l} a_{2 i}^{3} - 2 a_{i} a_{2 i}^{2} + 2 a_{i}^{2} a_{2 i} - a_{i}^{2} a_{4 i} = 0 for i \in {2^{j}}_{j = 0}^{{log}_{2} (s) - 2}, & (86) \end{array}

for when the final a_s is constrained to be less than â_s. To simplify the notation of the indexing in (86), observe that if i = 2^j for a fixed j, then 2i = 2^j+1 and 4i = 2^j+2. Therefore, it suffice to use the index a_j, a_j+1, and a_j+2 rather than a_i, a_2i, and a_4i. Moving the second two terms in (86) to the right and dividing the quadratic multiples we get the relation

\begin{array}{l} \frac{a_{j + 2} - 2 a_{j + 1}}{a_{j + 1}^{2}} = \frac{a_{j + 1} - 2 a_{j}}{a_{j}^{2}}, & (87) \end{array}

which is the same expression on the right and left, but with j increased by one on the left. This implies that the fraction is independent of j, so

\begin{array}{l} \frac{a_{j + 1} - 2 a_{j}}{a_{j}^{2}} = c, \Rightarrow a_{j + 1} = a_{j} (2 + c a_{j}), & (88) \end{array}

for some constant c independent of j (though not necessarily of n). This is in fact the relation (85), if we set c to be equal to −1/n. One can then wonder what is the behavior of c if we fix the final a_s. Moreover, (88) is equivalent to

\begin{array}{l} c a_{j + 1} + 1 = {(c a_{j} + 1)}^{2}, & (89) \end{array}

which inductively leads to

\begin{array}{l} c a_{l} + 1 = {(c a_{0} + 1)}^{2^{l}}, l > 0, & (90) \end{array}

so that one has a relation of the l^th stage in terms of the first stage. Note this does not require the a_s to be fixed, (90) is how one simply computes all a_l for l > 0 once one has a₀ and c. The point is that c to match the a_s one has to select c appropriately. So the way we calculate c is by knowing a₀ and a_s, then solving (90) for l = s. Unfortunately there is not an easy way to solve for c in (90) so we need to do some asymptotic approximation. Let's assume that a_l is close to â_l. So we do an asymptotic expansion in terms of the difference from â_l.

To simplify things a bit lets insert a₀ = d (since a₀ is a₁ in our standard notation) and then we insert what we know for â_l. For â_l we have c = −n⁻¹, see (85). We then have from (90) that

\begin{array}{l} a_{l} = c^{- 1} {(c d + 1)}^{2^{l}} - c^{- 1} and â_{l} = - n {(- d / n + 1)}^{2^{l}} + n . & (91) \end{array}

So if we write a_l = (1 − ε_n)â_l and consider the case of ε_n → 0 as n → 0. The point of this is that instead of working with â_l we can now work in terms of ε_n. Setting a_l = (1 − ε_n)â_l gives

\begin{array}{l} c^{- 1} {(c d + 1)}^{2^{l}} - c^{- 1} = - n (1 - ε_{n}) [{(- d / n + 1)}^{2^{l}} - 1] . & (92) \end{array}

We now solve for c as a function of ε_n and d. As ε_n goes to zero we should have c converging to −n⁻¹.

Let αn = 2^l, for 0 < α ≤ 1 and c = −β(ε_n, d)/n, then, dropping the argument of β(·, ·), (92) becomes

\begin{array}{l} \frac{n}{β} {(1 - \frac{β d}{n})}^{α n} - \frac{n}{β} = n (1 - ε_{n}) {(1 - \frac{d}{n})}^{α n} - n (1 - ϵ) . & (93) \end{array}

Multiplying through by β/n and performing a change of variables of k = αn, (93) becomes

\begin{array}{l} {(1 - \frac{α β d}{k})}^{k} - 1 = β (1 - ε_{n}) {(1 - \frac{α d}{k})}^{k} - β (1 - ε_{n}) . & (94) \end{array}

The left hand side of (94) simplifies to

\begin{array}{l} e^{- α β d} - 1 - \frac{e^{- α β d} α^{2} β^{2} d^{2}}{2 k} + O (k^{- 2}) . & (95) \end{array}

The right hand side of (94) simplifies to

\begin{array}{l} β (1 - ε_{n}) e^{- α d} - β (1 - ε_{n}) - \frac{β (1 - ε_{n}) e^{- α d} α^{2} d^{2}}{2 k} + O (k^{- 2}) . & (96) \end{array}

Matching powers of k in (95) and (96) for k⁰ and k⁻¹ yields the following.

\begin{array}{l} e^{- α β d} - 1 = β (1 - ε_{n}) e^{- α d} - β (1 - ε_{n}), and & (97) \end{array}

\begin{array}{l} α^{2} β^{2} d^{2} e^{- α β d} = β (1 - ε_{n}) α^{2} d^{2} e^{- α d} . & (98) \end{array}

Both of which respectively simplify to the following.

\begin{array}{l} e^{- α β d} - 1 = β (1 - ε_{n}) (e^{- α d} - 1), and & (99) \end{array}

\begin{array}{l} β e^{- α β d} = (1 - ε_{n}) e^{- α d} . & (100) \end{array}

Multiply (99) by β and subtract the two equations, (99) and (100), to get

\begin{array}{l} (1 - ε_{n}) (1 - e^{- α d}) β^{2} - β + (1 - ε_{n}) e^{- α d} = 0 . & (101) \end{array}

This yields

\begin{array}{l} β = \frac{1 \pm \sqrt{1 - 4 {(1 - ε_{n})}^{2} (1 - e^{- α d}) e^{- α d}}}{2 (1 - ε_{n}) (1 - e^{- α d})} . & (102) \end{array}

To be consistent with what c ought to be as ε_n → 0, we choose

\begin{array}{l} β (ε_{n}, d) = \frac{1 + \sqrt{1 - 4 {(1 - ε_{n})}^{2} (1 - e^{- α d}) e^{- α d}}}{2 (1 - ε_{n}) (1 - e^{- α d})}, & (103) \end{array}

as required – concluding the proof.□

6.7. Proposition 6.2

We use Lemma 6.1 to express ψ_i in (72) as follows

\begin{array}{l} ψ_{i} = - n [H (\frac{a_{i}}{n}) - H (\frac{a_{i} + c a_{i}^{2}}{n - a_{i}})] \\ + a_{i} [H (\frac{a_{i} + c a_{i}^{2}}{a_{i}}) - H (\frac{a_{i} + c a_{i}^{2}}{n - a_{i}})] & (104) \end{array}

\begin{array}{l} = - n [H (\frac{a_{i}}{n}) - H (\frac{a_{i}}{n} \cdot \frac{1 + c a_{i}}{1 - \frac{a_{i}}{n}})] \\ + a_{i} [H (1 + c a_{i}) - H (\frac{a_{i}}{n} \cdot \frac{1 + c a_{i}}{1 - \frac{a_{i}}{n}})] . & (105) \end{array}

Note that for regimes of small s/n considered

\begin{array}{l} - \frac{β}{n} = c \leq - \frac{1}{n}, \Rightarrow c a_{i} \leq - \frac{a_{i}}{n}, and 1 + c a_{i} \leq 1 - \frac{a_{i}}{n} . & (106) \end{array}

We need the following expressions for the Shannon entropy and it's first and second derivatives

\begin{array}{l} H (z) = - z log z - (1 - z) log (1 - z), & (107) \end{array}

\begin{array}{l} H^{'} (z) = log (\frac{1 - z}{z}), and & (108) \end{array}

\begin{array}{l} H^{″} (z) = - \frac{1}{z (1 - z)} . & (109) \end{array}

But also $H (z) = H (1 - z)$ due to the symmetry about z = 1/2. Similarly, $H^{″} (z)$ is symmetric about z = 1/2; while $H^{'} (z)$ is anti-symmetric, i.e., $H^{'} (z) = - H^{'} (1 - z)$ . Using the symmetry of $H (z)$ we rewrite ψ_i in (105) as follows.

\begin{array}{l} ψ_{i} = - n [H (\frac{a_{i}}{n}) - H (\frac{a_{i}}{n} \cdot \frac{1 + c a_{i}}{1 - \frac{a_{i}}{n}})] \\ + a_{i} [H (- c a_{i}) - H (\frac{a_{i}}{n} \cdot \frac{1 + c a_{i}}{1 - \frac{a_{i}}{n}})] . & (110) \end{array}

From (106), we deduce the following ordering

\begin{array}{l} \frac{a_{i}}{n} \cdot \frac{1 + c a_{i}}{1 - \frac{a_{i}}{n}} \leq \frac{a_{i}}{n} \leq - c a_{i} \leq 1 / 2 . & (111) \end{array}

The last equality in (111) follows from that we assume 2βa_i ≤ n, where 1 < β < 2 but very close to 1. This is a valid constraint on the cardinality of the supports a_i. To simplify notation, let $x_{1} = \frac{a_{i}}{n} \cdot \frac{1 + c a_{i}}{1 - \frac{a_{i}}{n}}$ , $x_{2} = \frac{a_{i}}{n}$ , and x₃ = −ca_i, which implies that x₁ ≤ x₂ ≤ x₃ ≤ 1/2. Therefore, from (110), we have

\begin{array}{l} ψ_{i} = - n [H (x_{2}) - H (x_{1})] + a_{i} [H (x_{3}) - H (x_{1})] & (112) \end{array}

\begin{array}{l} = - n [ℋ (x_{2}) - ℋ (x_{1})] + a_{i} [ℋ (x_{3}) - ℋ (x_{2}) \\ + ℋ (x_{2}) - ℋ (x_{1})] & (113) \end{array}

\begin{array}{l} = - (n - a_{i}) [H (x_{2}) - H (x_{1})] + a_{i} [H (x_{3}) - H (x_{2})] & (114) \end{array}

\begin{array}{l} \leq - (n - a_{i}) (x_{2} - x_{1}) H^{'} (x_{2}) + a_{i} (x_{3} - x_{2}) H^{'} (x_{2}) & (115) \end{array}

\begin{array}{l} = [a_{i} (x_{3} - x_{2}) - (n - a_{i}) (x_{2} - x_{1})] H^{'} (x_{2}) . & (116) \end{array}

Observe that the expression in the square brackets on the right hand side of (116) is zero, which implies that

\begin{array}{l} a_{i} (x_{3} - x_{2}) = (n - a_{i}) (x_{2} - x_{1}) . & (117) \end{array}

This is very easy to check by substituting the values of x₁, x₂, and x₃. So instead of bound (115), we alternatively upper bound (114) as follows

\begin{array}{l} ψ_{i} \leq - (n - a_{i}) (x_{2} - x_{1}) H^{'} (ξ_{21}) + a_{i} (x_{3} - x_{2}) H^{'} (ξ_{32}), & (118) \end{array}

where ξ₂₁ ∈ (x₁, x₂), and ξ₃₂ ∈ (x₂, x₃), which implies

\begin{array}{l} ξ_{21} < ξ_{32}, and H^{'} (ξ_{21}) > H^{'} (ξ_{32}) . & (119) \end{array}

Using relation (117), bound (118) simplifies to the following.

\begin{array}{l} ψ_{i} & \leq - a_{i} (x_{3} - x_{2}) H^{'} (ξ_{21}) + a_{i} (x_{3} - x_{2}) H^{'} (ξ_{32}) & (120) \end{array}

\begin{array}{l} = - a_{i} (x_{3} - x_{2}) [H^{'} (ξ_{21}) - H^{'} (ξ_{32})] & (121) \end{array}

\begin{array}{l} \leq - a_{i} (x_{3} - x_{2}) (ξ_{21} - ξ_{32}) H^{″} (ξ_{31}), & (122) \end{array}

for ξ₃₁ ∈ (x₁, x₃). Since ξ₂₁ < ξ₃₂, we rewrite bound (122) as follows.

\begin{array}{l} ψ_{i} & \leq a_{i} (x_{3} - x_{2}) (ξ_{32} - ξ_{21}) H^{″} (ξ_{31}) & (123) \end{array}

\begin{array}{l} \leq a_{i} η (x_{3} - x_{2}) H^{″} (x_{3}), & (124) \end{array}

where η = ξ₃₂ − ξ₂₁ > 0, and the last bound is due to the fact that x₃ > ξ₃₁.

Going back to our normal notation, we rewrite bound (124) as follows.

\begin{array}{l} ψ_{i} & \leq a_{i} η (- c a_{i} - \frac{a_{i}}{n}) H^{″} (- c a_{i}) & (125) \end{array}

\begin{array}{l} = a_{i} η \frac{a_{i}}{n} (β - 1) \frac{- 1}{β \frac{a_{i}}{n} (1 - β \frac{a_{i}}{n})} & (126) \end{array}

\begin{array}{l} = - \frac{a_{i} η (β - 1)}{β (1 - β \frac{a_{i}}{n})}, & (127) \end{array}

This concludes the proof.□

6.8. Inequality 84

The series bound (84) is derived as follows.

\begin{array}{l} \sum_{i \in Ω} (\frac{s}{2 i} \cdot \frac{a_{i}}{n}) = (\frac{s}{2} \cdot \frac{a_{1}}{n}) + (\frac{s}{4} \cdot \frac{a_{2}}{n}) + \dots + (\frac{s}{s} \cdot \frac{a_{s / 2}}{n}) & (128) \end{array}

\begin{array}{l} \geq (\frac{s}{2 n} \cdot \frac{(1 - ϵ) d s}{2^{{log}_{2} s}}) + (\frac{s}{4 n} \cdot \frac{(1 - ϵ) d s}{2^{{log}_{2} (s / 2)}}) + \dots \\ + (\frac{s}{s n} \cdot \frac{(1 - ϵ) d s}{2}) & (129) \end{array}

\begin{array}{l} = [(\frac{s}{2 n} \cdot \frac{1}{s}) + (\frac{s}{4 n} \cdot \frac{2}{s}) + \dots \\ + (\frac{s}{s n} \cdot \frac{1}{2})] (1 - ϵ) d s & (130) \end{array}

\begin{array}{l} = (\frac{1}{2 n} + \frac{1}{2 n} + \dots + \frac{1}{2 n}) (1 - ϵ) d s \\ = \frac{{log}_{2} (s / 2)}{2 n} (1 - ϵ) d s . & (131) \end{array}

That is the required bounds, hence concluding the proof.□

7. Conclusion

We considered the construction of sparse matrices that are invaluable for dimensionality reduction with application in diverse fields. These sparse matrices are more efficient computationally compared to their dense counterparts also used for the purpose of dimensionality reduction. Our construction is probabilistic based on the dyadic splitting method we introduced in Bah and Tanner [1]. By better approximation of the bounds we achieve a novel result, which is a reduced complexity of the sparsity per column of these matrices. Precisely, a complexity that is a state-of-the-art divided by logs, where s is the intrinsic dimension of the problem.

Our approach is one of a few that gives quantitative sampling theorems for existence of such sparse matrices. Moreover, using the phase transition framework comparison, our construction is better than existing probabilistic constructions. We are also able to compare performance of combinatorial compressed sensing algorithms by comparing their phase transition curves. This is one perspective in algorithm comparison amongst a couple of others like runtime and iteration complexities.

Evidently, our results holds true for the construction of expander graphs, which is a graph theory problem and is of interest to communities in theoretical computer science and pure mathematics.

Author Contributions

JT contributed to conception of the idea. Both JT and BB worked on the derivation of the initial results, which was substantially improved by BB. BB wrote the first draft of the manuscript. JT did some revisions and gave feedback.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

BB acknowledges the support from the funding by the German Federal Ministry of Education and Research (BMBF) for the German Research Chair at AIMS South Africa, funding for which is administered by Alexander von Humboldt Foundation (AvH). JT acknowledges support from The Alan Turing Institute under the EPSRC grant EP/N510129/1.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fams.2018.00039/full#supplementary-material

References

1. Bah B, Tanner J. Vanishingly sparse matrices and expander graphs, with application to compressed sensing. IEEE Trans Inform Theor. (2013) 59:7491–508. doi: 10.1109/TIT.2013.2274267

CrossRef Full Text | Google Scholar

2. Ahn KJ, Guha S, McGregor A. Graph sketches: sparsification, spanners, and subgraphs. In: Proceedings of the 31st Symposium on Principles of Database Systems. Scottsdale, AZ: ACM (2012). p. 5–14.

Google Scholar

3. Gilbert AC, Levchenko K. Compressing network graphs. In: Proceedings of the LinkKDD Workshop at the 10th ACM Conference on KDD. Seatle, WA (2004).

Google Scholar

4. Vardi Y. Network tomography: estimating source-destination traffic intensities from link data. J Am Stat Assoc. (1996) 91:365–77.

Google Scholar

5. Castro R, Coates M, Liang G, Nowak R, Yu B. Network tomography: recent developments. Stat Sci. (2004) 19:499–517. doi: 10.1214/088342304000000422

CrossRef Full Text | Google Scholar

6. Muthukrishnan S. Data Streams: Algorithms and Applications. Now Publishers Inc (2005).

Google Scholar

7. Indyk P. Sketching, Streaming and Sublinear-Space Algorithms. Graduate course notest (2007).

Google Scholar

8. Dwork C, McSherry F, Talwar K. The price of privacy and the limits of LP decoding. In: Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing. San Diego, CA: ACM (2007). p. 85–94.

Google Scholar

9. Donoho DL, Johnstone IM, Hoch JC, Stern AS. Maximum entropy and the nearly black object. J R Stat Soc Ser B (1992) 54:41–81.

Google Scholar

10. Donoho DL. Compressed sensing. IEEE Trans Inform Theor. (2006) 52:1289–306. doi: 10.1109/TIT.2006.871582

CrossRef Full Text | Google Scholar

11. Xu W, Hassibi B. Efficient compressive sensing with deterministic guarantees using expander graphs. In: Information Theory Workshop, 2007. ITW'07. Tahoe City, CA: IEEE (2007). p. 414–9.

Google Scholar

12. Jafarpour S, Xu W, Hassibi B, Calderbank R. Efficient and robust compressed sensing using optimized expander graphs. IEEE Trans Inform Theor. (2009) 55:4299–308. doi: 10.1109/TIT.2009.2025528

CrossRef Full Text | Google Scholar

13. Berinde R, Indyk P. Sequential sparse matching pursuit. In: 47th Annual Allerton Conference on Communication, Control, and Computing. IEEE (2009). p. 36–43.

Google Scholar

14. Mendoza-Smith R, Tanner J. Expander ℓ₀-decoding. Appl. Comput. Harmon. Anal. (2017) 45:642–67. doi: 10.1016/j.acha.2017.03.001

CrossRef Full Text | Google Scholar

15. Mendoza-Smith R, Tanner J, Wechsung F. A robust parallel algorithm for combinatorial compressed sensing. arXiv [preprint] arXiv:170409012 (2017).

Google Scholar

16. Du DZ, Hwang FK. Combinatorial Group Testing and Its Applications. World Scientific (2000).

17. Gilbert AC, Iwen MA, Strauss MJ. Group testing and sparse signal recovery. In: Signals, Systems and Computers, 2008 42nd Asilomar Conference on. Pacific Grove, CA: IEEE (2008). p. 1059–63.

Google Scholar

18. Berinde R, Gilbert AC, Indyk P, Karloff H, Strauss MJ. Combining geometry and combinatorics: A unified approach to sparse signal recovery. In: Communication, Control, and Computing, 2008 46th Annual Allerton Conference on. Urbana-Champaign, IL: IEEE (2008). p. 798–805.

Google Scholar

19. Foucart S, Rauhut H. A Mathematical Introduction to Compressive Sensing. New York, NY: Springer (2013).

Google Scholar

20. Hoory S, Linial N, Wigderson A. Expander graphs and their applications. Bull Am Math Soc. (2006) 43:439–562. doi: 10.1090/S0273-0979-06-01126-8

CrossRef Full Text | Google Scholar

21. Guruswami V, Umans C, Vadhan S. Unbalanced expanders and randomness extractors from Parvaresh–Vardy codes. J ACM (2009) 56:20. doi: 10.1145/1538902.1538904

CrossRef Full Text | Google Scholar

22. Bassalygo LA, Pinsker MS. Complexity of an optimum nonblocking switching network without reconnections. Problemy Peredachi Informatsii (1973) 9:84–87.

Google Scholar

23. Indyk P, Razenshteyn I. On model-based RIP-1 matrices. In: International Colloquium on Automata, Languages, and Programming. Berlin; Heidelberg: Springer (2013). p. 564–75.

Google Scholar

24. Bah B, Baldassarre L, Cevher V. Model-based sketching and recovery with expanders. In: SODA. SIAM (2014). p. 1529–43.

Google Scholar

25. Donoho DL, Tanner J. Thresholds for the recovery of sparse solutions via l₁ minimization. In: Information Sciences and Systems, 2006 40th Annual Conference on. IEEE (2006). p. 202–6.

Google Scholar

26. Buhrman H, Miltersen PB, Radhakrishnan J, Venkatesh S. Are bitvectors optimal? SIAM J Comput. (2002) 31:1723–44. doi: 10.1137/S0097539702405292

CrossRef Full Text | Google Scholar

27. Berinde R. Advances in Sparse Signal Recovery Methods. Massachusetts Institute of Technology (2009).

28. Candès EJ, Romberg J, Tao T. Stable signal recovery from incomplete and inaccurate measurements. Commun Pure Appl Math. (2006) 59:1207–23. doi: 10.1002/cpa.20124

CrossRef Full Text | Google Scholar

29. Capalbo M, Reingold O, Vadhan S, Wigderson A. Randomness conductors and constant-degree lossless expanders. In: Proceedings of the Thirty-fourth Annual ACM Symposium on Theory of Computing. New York, NY: ACM (2002). p. 659–68.

Google Scholar

30. Blanchard JD, Cartis C, Tanner J. Compressed sensing: how sharp is the restricted isometry property? SIAM Rev. (2011) 53:105–25. doi: 10.1137/090748160

CrossRef Full Text | Google Scholar

31. Bah B, Tanner J. Improved bounds on restricted isometry constants for Gaussian matrices. SIAM J Matrix Anal Appl. (2010) 31:2882–98. doi: 10.1137/100788884

CrossRef Full Text | Google Scholar

32. Indyk P, Ruzic M. Near-optimal sparse recovery in the ℓ₁-norm. In: Foundations of Computer Science, 2008. FOCS'08. IEEE 49th Annual IEEE Symposium on. Philadelphia, PA: IEEE (2008). p. 199–207.

Google Scholar

33. Berinde R, Indyk P, Ruzic M. Practical near-optimal sparse recovery in the ℓ₁-norm. In: 46th Annual Allerton Conference on Communication, Control, and Computing. Piscataway, NJ: IEEE (2008). p. 198–205.

Google Scholar

Keywords: compressed sensing (CS), expander graphs, probabilistic construction, sample complexity, sampling theorems, combinatorial compressed sensing, linear sketching, sparse recovery algorithms

Citation: Bah B and Tanner J (2018) On the Construction of Sparse Matrices From Expander Graphs. Front. Appl. Math. Stat. 4:39. doi: 10.3389/fams.2018.00039

Received: 27 June 2018; Accepted: 10 August 2018;
Published: 04 September 2018.

Edited by:

Haizhao Yang, National University of Singapore, Singapore

Reviewed by:

Alex Jung, Aalto University, Finland
Xiuyuan Cheng, Duke University, United States
Zhihui Zhu, Johns Hopkins University, United States

Copyright © 2018 Bah and Tanner. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Bubacarr Bah, YnViYWNhcnJAYWltcy5hYy56YQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.