Efficient Tuning-Free l1-Regression of Nonnegative Compressible Signals

Petersen, Hendrik Bernd; Bah, Bubacarr; Jung, Peter

doi:10.3389/fams.2021.615573

ORIGINAL RESEARCH article

Front. Appl. Math. Stat., 07 June 2021

Sec. Mathematics of Computation and Data Science

Volume 7 - 2021 | https://doi.org/10.3389/fams.2021.615573

This article is part of the Research Topic2021 Editor’s Pick: Applied Mathematics and StatisticsView all 13 articles

Efficient Tuning-Free l₁-Regression of Nonnegative Compressible Signals

Hendrik Bernd Petersen¹*

Bubacarr Bah^2,3

Peter Jung¹

¹Communications and Information Theory Group, Technische Universtität Berlin, Berlin, Germany
²African Institute for Mathematical Sciences, Cape Town, South Africa
³Division of Applied Mathematics, Stellenbosch University, Stellenbosch, South Africa

In compressed sensing the goal is to recover a signal from as few as possible noisy, linear measurements with the general assumption that the signal has only a few non-zero entries. The recovery can be performed by multiple different decoders, however most of them rely on some tuning. Given an estimate for the noise level a common convex approach to recover the signal is basis pursuit denoising. If the measurement matrix has the robust null space property with respect to the $ℓ_{2}$ -norm, basis pursuit denoising obeys stable and robust recovery guarantees. In the case of unknown noise levels, nonnegative least squares recovers non-negative signals if the measurement matrix fulfills an additional property (sometimes called the $M^{+}$ -criterion). However, if the measurement matrix is the biadjacency matrix of a random left regular bipartite graph it obeys with a high probability the null space property with respect to the $ℓ_{1}$ -norm with optimal parameters. Therefore, we discuss non-negative least absolute deviation (NNLAD), which is free of tuning parameters. For these measurement matrices, we prove a uniform, stable and robust recovery guarantee. Such guarantees are important, since binary expander matrices are sparse and thus allow for fast sketching and recovery. We will further present a method to solve the NNLAD numerically and show that this is comparable to state of the art methods. Lastly, we explain how the NNLAD can be used for viral detection in the recent COVID-19 crisis.

1 Introduction

Since it has been realized that many signals admit a sparse representation in some frames, the question arose whether or not such signals can be recovered from less samples than the dimension of the domain by utilizing the low dimensional structure of the signal. The question was already answered positively in the beginning of the millennium [1, 2]. By now there are multiple different decoders to recover a sparse signal from noisy measurements with robust recovery guarantees. Most of them however rely on some form of tuning, depending on either the signal or the noise.

The basis pursuit denoising requires an upper bound on the norm of the noise ([3], Theorem 4.22), the least shrinkage and selection operator an estimate on the $ℓ_{1}$ -norm of the signal ([4], Theorem 11.1) and the Lagrangian version of least shrinkage and selection operator allegedly needs to be tuned to the order of the the noise level ([4], Theorem 11.1). The expander iterative hard thresholding needs the sparsity of the signal or an estimate of the order of the expansion property ([3], Theorem 13.15). The order of the expansion property can be calculated from the measurement matrix, however there is no polynomial time method known to do this. Variants of these methods have similar drawbacks. The non-negative basis pursuit denoising requires the same tuning parameter as the basis pursuit denoising [5]. Other thresholding based decoders like sparse matching pursuit and expander matching pursuit have the same limitations as the expander iterative hard thresholding [6].

If these side information is not known a priori, many decoders yield either no recovery guarantees or, in their imperfectly tuned versions, yield sub-optimal estimation errors ([3], Theorem 11.12). Even though the problem of sparse recovery from under-sampled measurements has been answered long ago, finding tuning free decoders that achieve robust recovery guarantees is still a topic of interest.

The most prominent achievement for that is the non-negative least squares (NNLS) [7–11]. It is completely tuning free [12] and in [13, 14] it was proven that it achieves robust recovery guarantees if the measurement matrix consists of independent biased sub-Gaussian random variables.

1.1 Our Contribution

We will replace the least squares in the NNLS with an arbitrary norm and obtain the non-negative least residual (NNLR). We use the methods of [13] to prove recovery guarantees under similar conditions as the NNLS. In particular, we consider the case where we minimize the $ℓ_{1}$ -norm of the residual (NNLAD) and give a recovery guarantee if the measurement matrix is a random walk matrix of a uniformly at random drawn D-left regular bipartite graph.

In general, our results state that if the $M^{+}$ criterion is fulfilled, the basis pursuit denoising can be replaced by the tuning-less NNLR for non-negative signals. Note that the M⁺criterion can be fulfilled by adding only one explicitly chosen measurement if that is possible in the application. Thus, in practice the NNLR does not require more measurements than the BPDN to recover sparse signals. While biased sub-Gaussian measurement matrices rely on a probabilistic argument to verify that such a measurement is present, random walk matrices of left regular graphs naturally have such a measurement. The tuning-less nature gives the NNLR an advantage over other decoders if the noise power can not be estimated, which is for instance the case when we have multiplicative noise or the measurements are Poisson distributed. Note that Laplacian distributed noise or the existence of outliers also favors an $ℓ_{1}$ regression approach over an $ℓ_{2}$ regression approach and thus motivate to use the NNLAD over the NNLS.

Further, the sparse structure of left regular graphs can reduce the encoding and decoding time to a fraction. Using [15] we can solve the NNLAD with a first order method of a single optimization problem with a sparse measurement matrix. Other state of the art decoders often use non-convex optimization, computationally complex projections or need to solve multiple different optimization problems. For instance, to solve the basis pursuit denoising given a tuning parameter a common approach is to solve a sequence of $ℓ_{1}$ -constrained least residual¹ problems to approximate where the Pareto curve attains the value of the tuning parameter of basis pursuit denoising [16]. Cross-validation techniques suffer from similar issues [17].

1.2 Relations to Other Works

We build on the theory of [13] that uses the $ℓ_{2}$ null space property and the $M^{+}$ criterion. These methods have also been used in [12, 14]. To the best of the authors knowledge the $M^{+}$ criterion has not been used with an $ℓ_{1}$ null space property before. Other works have used adjacency matrices of graphs as measurements matrices including [6, 18–21]. The works [18, 19] did not consider noisy observations. The decoder in [20] is the basis pursuit denoising and thus requires tuning depending on the noise power. [21] proposes two decoders for non-negative signals. The first is the non-negative basis pursuit which could be extended to the non-negative basis pursuit denoising. However, this again needs a tuning parameter depending on the noise power. The second decoder, the Reverse Expansion Recovery algorithm, requires the order of the expansion property, which is not known to be calculatable in a polynomial time. The survey [6] contains multiple decoders including the basis pursuit, which again needs tuning depending on the noise power for robustness, the expander matching pursuit and the sparse matching pursuit, which need the order of the expansion property. Further, [5] considered sparse regression of non-negative signals and also used the non-negative basis pursuit denoising as decoder, which again needs tuning dependent on the noise power. To the best of the authors knowledge, this is the first work that considers tuning-less sparse recovery for random walk matrices of left regular bipartite graphs. The NNLAD has been considered in [22] with a structured sparsity model without the use of the $M^{+}$ criterion.

2 Preliminaries

For $K \in ℕ$ we denote the set of integers from 1 to K by $[K]$ . For a set $T \subset [N]$ we denote the number of elements in T by $# (T)$ . Vectors are denoted by lower case bold face symbols, while its corresponding components are denoted by lower case italic letters. Matrices are denoted by upper case bold face symbols, while its corresponding components are denoted by upper case italic letters. For $x \in ℝ^{N}$ we denote its $ℓ_{p}$ -norms by ${| | x | |}_{p}$ . Given $A \in ℝ^{M \times N}$ we denote its operator norm as operator from $ℓ_{q}$ to $ℓ_{p}$ by ${| | A | |}_{q \to p} : = {sup}_{v \in ℝ^{N}, {| | v | |}_{q} \leq 1} {| | A v | |}_{p}$ . By $ℝ_{+}^{N}$ we denote the non-negative orthant. Given a closed convex set $C \subset ℝ^{N}$ , we denote the projection onto C, i.e., the unique minimizer of $\arg \min_{z \in C} 1 / 2 {| | z - v | |}_{2}^{2}$ , by $P_{C} (v)$ . For a vector $x \in ℝ^{N}$ and a set $T \subset [N]$ , $x |_{T}$ denotes the vector in $ℝ^{N}$ , whose nth component is $x_{n}$ if $n \in T$ and 0 else. Given $N, S \in ℕ$ we will often need sets $T \subset [N]$ with $# (T) \leq S$ and we abbreviate this by $# (T) \leq S$ if no confusion is possible.

Given a measurement matrix $A \in ℝ^{M \times N}$ a decoder is any map $Q_{A} : ℝ^{M} \to ℝ^{N}$ . We refer to $x \in ℝ^{N}$ as signal. If $x \in ℝ_{+}^{N} = {z \in ℝ^{N} : z_{n} \geq 0 for all n \in [N]}$ , we say the signal is non-negative and write shortly $x \geq 0$ . If additionally $x_{n} > 0$ for all $n \in [N]$ , we write $x > 0$ . An input of a decoder, i.e., any $y \in ℝ^{M}$ , is refered to as observation. We allow all possible inputs of the decoder as observation, since in general the transmitted codeword $A x$ is disturbed by some noise. Thus, given a signal $x$ and an observation $y$ we call $e : = y - A x$ the noise. A signal $x$ is called S-sparse if ${| | x | |}_{p} : = # ({n \in [N] : x_{n} \neq 0}) \leq S$ . We denote the set of S-sparse vectors by

Σ_{S} : = {z \in ℝ^{N} : {‖ z ‖}_{0} \leq S} .

Given some $S \in [N]$ the compressibility of a signal $x$ can be measured by $d_{1} (x, Σ_{S}) : = {inf}_{z \in Σ_{S}} {| | x - z | |}_{1}$ .Given N and S, the general non-negative compressed sensing task is to find a measurement matrix $A \in ℝ^{M \times N}$ and a decoder $Q_{A} : ℝ^{M} \to ℝ^{N}$ with M as small as possible such that the following holds true: There exists a $q \in [1, \infty]$ and a continuous function $C : ℝ \times ℝ^{M} \to ℝ_{+}$ with $C (0,0) = 0$ such that

{| | Q_{A} (y) - x | |}_{q} \leq C (d_{1} (x, Σ_{S}), y - A x) for all x \in ℝ_{+}^{N} and y \in ℝ^{M}

holds true. This will ensure that if we can control the compressibility and the noise, we can also control the estimation error and in particular decode every noiseless observation of S-sparse signals exactly.

3 Main Results

Given a measurement matrix $A \in ℝ^{M \times N}$ and a norm $‖ \cdot ‖$ on $ℝ^{M}$ we define the decoder as follows: Given $y \in ℝ^{M}$ set $Q_{A}$ (y) as any minimizer of

\underset{z \geq 0}{argmin} | | A z - y | | .

We call this problem non-negative least residual (NNLR). In particular, for $‖ \cdot ‖ = {‖ \cdot ‖}_{1}$ this problem is called non-negative least absolute deviation (NNLAD) and for $∥ \cdot ∥ = {‖ \cdot ‖}_{2}$ this problem is known as the non-negative least squares (NNLS) studied in [13]. In fact, we can translate the proof techniques fairly simple. We just need to introduce the dual norm.

Definition 3.1. Let $∥ \cdot ∥$ be a norm on $ℝ^{M}$ . The norm ${‖ \cdot ‖}_{*}$ on $ℝ^{M}$ defined by

{‖ v ‖}_{*} : = {sup}_{u \leq 1} 〈 v, u 〉,

is called dual norm to $‖ \cdot ‖$ .

Note that the dual norm is actually a norm. To obtain a recovery guarantee for NNLR we have certain requirements on the measurement matrix $A$ . We use a null space property.

Definition 3.2. Let $S \in [N]$ , $q \in [1, \infty)$ and $∥ \cdot ∥$ be any norm on $ℝ^{M}$ . Further let $A \in ℝ^{M \times N}$ . Suppose there exists constants $ρ \in [0,1)$ and $τ \in [0, \infty)$ such that

{‖ v |_{T} ‖}_{q} \leq ρ S^{1 / q - 1} {‖ v |_{T^{c}} ‖}_{1} + τ | | A v | | for all v \in ℝ^{N} and # (T) \leq S .

Then, we say $A$ has the $ℓ_{q}$ -robust null space property of order S with respect to $‖ \cdot ‖$ or in short $A$ has the $ℓ_{q}$ -RNSP of order S with respect to $‖ \cdot ‖$ with constants ρ and τ. ρ is called stableness constant and τ is called robustness constant.

Note that smaller stableness constants increase the reliability of recovery if many, small, non-zero components are present, while smaller robustness constants increase the reliability if the measurements are noisy. In order to make use of the non-negativity of the signal, we need $A$ to be biased in a certain way. In [13] this bias was guaranteed with the $M^{+}$ criterion.

Definition 3.3. Let $A \in ℝ^{M \times N}$ . Suppose there exists $t \in ℝ^{M}$ such that $A^{T} t > 0$ . Then we say $A$ obeys the the $M^{+}$ criterion with vector $t$ and constant $κ : = {max}_{n \in [N]} | {(A^{T} t)}_{n} | {max}_{n \in [N]} | {(A^{T} t)}_{n}^{- 1} |$ .

Note that κ is actually a condition number of the matrix with diagonal $A^{T} t$ and 0 else. Condition number numbers are frequently used in error bounds of numerical linear algebra. The general recovery guarantee is the following and similar results have been obtained in the matrix case in [23].

Theorem 3.4 (NNLR Recovery Guarantee). Let $S \in [N]$ , $q \in [1, \infty)$ and $∥ \cdot ∥$ be any norm on $ℝ^{M}$ with dual norm ${‖ \cdot ‖}_{*}$ . Further, suppose that $A \in ℝ^{M \times N}$ obeys

a) the $ℓ_{q}$ -RNSP of order S with respect to $∥ \cdot ∥$ with constants ρ and τ and

b) the $M^{+}$ criterion with vector $t$ and constant κ.

If $κ ρ < 1$ , the following recovery guarantee holds true: For all $x \in ℝ_{+}^{N}$ and $y \in ℝ^{M}$ any minimizer $x^{#}$ of

\underset{z \geq 0}{argmin} ‖ A z - y ‖,

obeys the bound

{‖ x - x^{#} ‖}_{q} \leq 2 \frac{{(1 + κ ρ)}^{2}}{1 - κ ρ} κ S^{1 / q - 1} d_{1} (x, Σ_{S}) + 2 (\frac{{(1 + κ ρ)}^{2}}{1 - κ ρ} S^{1 / q - 1} max_{n \in [N]} | {(A^{T} t)}_{n}^{- 1} | {‖ t ‖}_{*} + \frac{3 + κ ρ}{1 - κ ρ} κ τ) ‖ A x - y ‖ .

If $q = 1$ , this bound can be improved to

{| | x - x^{#} | |}_{1} \leq 2 \frac{1 + κ ρ}{1 - κ ρ} κ d_{1} (x, Σ_{S}) + 2 (\frac{1 + κ ρ}{1 - κ ρ} max_{n \in [N]} | {(A^{T} t)}_{n}^{- 1} | {‖ t ‖}_{*} + \frac{2}{1 - κ ρ} κ τ) ‖ A x - y ‖ .

Proof The proof can be found in Subsection 6.1.

Given a matrix with $ℓ_{q}$ -RNSP we can add a row of ones (or a row consisting of one minus the column sums of the matrix) to fulfill the $M^{+}$ criterion with the optimal $κ = 1$ . Certain random measurement matrices guarantee uniform bounds on κ for fixed vectors $t$ . In ([13], Theorem 12) it was proven that if $A_{m, n}$ are all i.i.d. $0 / 1$ Bernoulli random variables, $A$ has $M^{+}$ criterion with $t = {(1, \dots, 1)}^{T} \in ℝ^{M}$ and $κ \leq 3$ with high probability. This is problematic, since if $κ > 1$ , it might happen that $κ ρ < 1$ is not fulfilled anymore. Since the stableness constant $ρ (S^{'})$ as a function of $S^{'}$ is monotonically increasing, the condition $κ ρ (S^{'}) < 1$ might only hold if $S^{'} < S$ . If that is the case, there are vectors $x \in Σ_{S}$ that are being recovered by basis pursuit denoising but not by NNLS! This is for instance the case for the matrix $A = (\begin{matrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{matrix})$ , which has $ℓ_{1}$ -robust null space property of order 1 with stableness constant $ρ : = 1 / 2$ and $M^{+}$ criterion with $κ \geq 2$ for any possible choice of $t$ . In particular, the vector $x = {(0,0,1)}^{T}$ is not necessarily being recovered by the NNLAD and the NNLS.

Hence, it is crucial that the vector $t$ is chosen to minimize κ and ideally obeys the optimal $κ = 1$ . This motivates us to use random walk matrices of regular graphs since they obey exactly this.

Definition 3.5. Let $A \in {0,1}^{M \times N}$ and $D \in [M]$ . For $T \subset N$ the set

Row (T) : = \cup_{n \in T} {m \in [M] such that A_{m, n} = 1}

is called the set of right vertices connected to the set of left vertices T. If

# (Row ({n})) = D for all n \in [N],

then $D^{- 1} A \in {0, D^{- 1}}^{M \times N}$ is called a random walk matrix of a D-left regular bipartite graph. We also say short that $D^{- 1} A$ is a D-LRBG. If additionally there exists a $θ \in [0,1)$ such that

# (Row (T)) \geq (1 - θ) D # (T) for all # (T) \leq S,

holds true, then $D^{- 1} A$ is called a random walk matrix of a $(S, D, θ)$ -lossless expander.

Note that we have made a slight abuse of notation. The term D-LRBG as a short form for D-left regular bipartite graph refers in our case to the random walk matrix $A$ but not the graph itself. We omit this minor technical differentiation, for the sake of shortening the frequently used term random walk matrix of a D-left regular bipartite graph. Lossless expanders are bipartite graphs that have a low number of edges but are still highly connected, see for instance ([24], Chapter 4). As a consequence their random walk matrices have good properties for compressed sensing. It is well known that random walk matrices of a $(2 S, D, θ)$ -lossless expanders obey the $ℓ_{1}$ -RNSP of order S with respect to $‖ \cdot ‖$ , see ([3], Theorem 13.11). The dual norm of ${‖ \cdot ‖}_{1}$ is the norm ${‖ \cdot ‖}_{\infty}$ and the $M^{+}$ criterion is easily fulfilled, since the columns sum up to one. From Theorem 3.4 we can thus draw the following corollary.

Corollary 3.6 (Lossless Expander Recovery Guarantee). Let $S \in [N]$ , $θ \in [0, 1 / 6)$ . If $A \in {0, D^{- 1}}^{M \times N}$ is a random walk matrix of a $(2 S, D, θ)$ -lossless expander, then the following recovery guarantee holds true: For all $x \in ℝ_{+}^{N}$ and $y \in ℝ^{M}$ any minimizer $x^{#}$ of

\underset{z \geq 0}{argmin} {| | A z - y | |}_{1}

obeys the bound

{‖ x - x^{#} ‖}_{1} \leq 2 \frac{1 - 2 θ}{1 - 6 θ} d_{1} (x, Σ_{S}) + 2 \frac{3 - 2 θ}{1 - 6 θ} {| | A x - y | |}_{1} (1)

Proof By ([3], Theorem 13.11) $A$ has $ℓ_{1}$ -RNSP with respect to ${‖ \cdot ‖}_{1}$ with constants $ρ = 2 θ / (1 - 4 θ)$ and $τ = 1 / (1 - 4 θ)$ . The dual norm of the norm ${‖ \cdot ‖}_{1}$ is ${‖ \cdot ‖}_{\infty}$ . If we set $t : = {(1, \dots, 1)}^{T} \in ℝ^{M}$ , we get

{(A^{T} t)}_{n} = \sum_{m \in [M]} A_{m, n} = D D^{- 1} = 1 for all n \in [N] .

Hence, $A$ has the $M^{+}$ criterion with vector $t$ and constant $κ = 1$ and the condition $κ ρ < 1$ is immediately fulfilled. We obtain ${| | t | |}_{*} = {| | t | |}_{\infty} = 1$ and ${max}_{n \in [N]} | {(A^{T} t)}_{n}^{- 1} | = 1$ . Applying Theorem 3.4 with improved bound for $q = 1$ and these values yields

{‖ x - x^{#} ‖}_{1} \leq 2 \frac{1 + ρ}{1 - ρ} d_{1} (x, Σ_{S}) + 2 (\frac{1 + ρ}{1 - ρ} + \frac{2}{1 - ρ} τ) {| | A x - y | |}_{1} .

If we additionally substitute the values for ρ and τ we get

{| | x - x^{#} | |}_{1} \leq 2 \frac{1 - 2 θ}{1 - 6 θ} d_{1} (x, Σ_{S}) + 2 (\frac{1 - 2 θ}{1 - 6 θ} + 2 \frac{1}{1 - 6 θ}) {| | A x - y | |}_{1} \leq 2 \frac{1 - 2 θ}{1 - 6 θ} d_{1} (x, Σ_{S}) + 2 \frac{3 - 2 θ}{1 - 6 θ} {| | A x - y | |}_{1} .

This finishes the proof.

Note that ([3], Theorem 13.11) is an adaption of ([20], Lemma 11) to account for robustness and skips proving the $ℓ_{1}$ restricted isometry property. If $M \geq 2 / θ \exp (2 / θ) SLn (e N / S)$ and $D = ⌈ 2 / θ Ln (e N / S) ⌉$ , a uniformly at random drawn D-LRBG is a random walk matrix of a $(2 S, D, θ)$ -lossless expander with a high probability[ [3], Theorem 13.7]. Thus, recovery with the NNLAD is possible in the optimal regime $M \in O (S log N / S)$ .

3.2 On the Robustness Bound for Lossless Expanders

If $A$ is a random walk matrix of a $(2 S, D, θ)$ -lossless expander with $θ \in [0, 1 / 6)$ , then we can also draw a recovery guarantee for the NNLS. By ([3], Theorem 13.11) $A$ has $ℓ_{1}$ -RNSP with respect to ${‖ \cdot ‖}_{1}$ with constants $ρ = 2 θ / (1 - 4 θ)$ and $τ = 1 / (1 - 4 θ)$ and hence also $ℓ_{1}$ -RNSP with respect to ${‖ \cdot ‖}_{2}$ with constants $ρ^{'} = ρ$ and $τ^{'} = τ M^{1 / 2}$ . Similar to the proof of Corollary 3.6 we can use Theorem 3.4 to deduce that any minimizer $x^{#}$ of

\underset{z \geq 0}{argmin} {| | A z - y | |}_{2},

obeys the bound

{| | x - x^{#} | |}_{1} \leq 2 \frac{1 - 2 θ}{1 - 6 θ} d_{1} (x, Σ_{S}) + 2 \frac{3 - 2 θ}{1 - 6 θ} M^{1 / 2} {| | A x - y | |}_{2} (2)

If the measurement error $e = y - A x$ is a constant vector, i.e., $e = α 1$ , then ${‖ e ‖}_{1} = M^{1 / 2} {‖ e ‖}_{2}$ . In this case the error bound of the NNLS is just as good as the error bound of the NNLAD. However, if $e$ is a standard unit vector, then ${‖ e ‖}_{1} = {‖ e ‖}_{2}$ . In this case the error bound of the NNLS is significantly worse than the error bound of the NNLAD. Thus, the NNLAD performs better under peaky noise, while the NNLS and NNLAD are tied under noise with evenly distributed mass. We will verify this numerically in Subsection 5.1. One can draw a complementary result for matrices with biased sub-Gaussian entries, which obey the $ℓ_{2}$ -RNSP with respect to ${‖ \cdot ‖}_{2}$ and the $M^{+}$ criterion in the optimal regime [13]. Table 1 states the methods, which have an advantage over the other in each scenario.

TABLE 1

TABLE 1. Table of advantages of NNLAD and NNLS over each other.

4 Non-Negative Least Absolute Deviation Using a Proximal Point Method

In this section we assume that $‖ \cdot ‖ = {‖ \cdot ‖}_{p}$ with some $p \in [1, \infty]$ . If $p \in {1, \infty}$ , the NNLR can be recast as a linear program by introducing some slack variables. For an arbitrary p the NNLR is a convex optimization problem and the objective function has a simple and globally bounded subdifferential. Thus, the NNLR can directly be solved with a projective subgradient method using a problem independent step size. Such subgradient methods achieve only a convergence rate of $O (log (k) k^{- 1 / 2})$ toward the optimal objective value ([25], Section 3.2.3), where k is the number of iterations performed. In the case that the norm is the $ℓ_{2}$ -norm, we can transfer the problem into a differentiable version, i.e. the NNLS

\underset{z \geq 0}{argmin} \frac{1}{2} {| | A z - y | |}_{2}^{2} .

Since the gradient of such an objective is Lipschitz, this problem can be solved by a projected gradient method with constant step size, which achieves a convergence rate of $O (k^{- 2})$ toward the optimal objective value [26, 27]. However this does not generalize to the $ℓ_{1}$ -norm. The proximal point method proposed in [15] can solve the case of the $ℓ_{1}$ -norm with a convergence rate $O (k^{- 1})$ toward the optimal objective value. Please refer to Algorithm 1.

ALGORITHM 1

ALGORITHM 1. NNLAD as First Order Method

Algorithm 1 is a primal-dual algorithm. Within the loop, lines 7, 8, 1 and 2 calculate the proximal point operator of the Fenchel conjugate of ${‖ A \cdot - y ‖}_{1}$ to update the dual problem, lines 3 and 5 update the primal problem, and lines 4 and 6 perform a momentum step to accelerate convergence. Further, line 8 sets $\tilde{x}$ to $A x$ and avoids a third matrix vector multiplication. Note that $σ_{1}$ and $σ_{2}$ can be replaced by any values that satisfy $σ_{1} σ_{2} < {| | A | |}_{2 \to 2}^{- 2}$ . The calculation of $σ_{1}$ and $σ_{2}$ might be a bottle neck for the computational complexity of the algorithm. If one wants to solve multiple problems with the same matrix, $σ_{1}$ and $σ_{2}$ should only be calculated once and not in each run of the algorithm. For any $σ_{1} σ_{2} < ‖ A ‖_{2 \to 2}^{- 2}$ the following convergence guarantee can be deduced from ([15], Theorem 1). Let $x^{k}$ and $w^{k}$ be the values of $x$ and $w$ at the end of the kth iteration of the while loop of Algorithm 1. Then, the following statements hold true:

(1) The iterates converge: The sequence ${(x^{k})}_{k \in ℕ}$ converges to a minimizer of ${argmin}_{z \geq 0} {| | A z - y | |}_{1}$ .

(2) The iterates are feasible: We have $x^{k} \geq 0$ and ${| | w^{k} | |}_{\infty} \leq 1$ for all $k \geq 1$ .

(3) There is a stopping criteria for the iterates: ${lim}_{k \to \infty} {| | A x^{k} - y | |}_{1} + 〈 y, w^{k} 〉 = 0$ and ${lim}_{k \to \infty} A^{T} w^{k} \geq 0$ . In particular, if ${| | A x^{k} - y | |}_{1} + 〈 y, w^{k} 〉 \leq 0$ and $A^{T} w^{k} \geq 0$ , then $x^{k}$ is a minimizer of $\arg \min_{z \geq 0} {| | A z - y | |}_{1}$ .

(5) The averages obey the convergence rate toward the optimal objective value: ${‖ A 1 / k \sum_{k' = 1}^{k} x^{k^{'}} - y ‖}_{1} - {‖ A x^{#} - y ‖}_{1} \leq 1 / k (1 / (2 σ_{2}) {| | x^{#} - x^{0} | |}_{2}^{2} + 1 / (2 σ_{1}) ({‖ w^{0} ‖}_{2}^{2} + 2 {‖ w^{0} ‖}_{1} + M))$ , where $x^{#}$ is a minimizer of $\arg \min_{z \geq 0} {| | A z - y | |}_{1}$ .

The formal version and proof can be found in [28]. Note that this yields a convergence guarantee for both the iterates and averages, but the convergence rate is only guaranteed for the averages. Algorithm 1 is optimized in the sense that it uses the least possible number of matrix vector multiplications per iteration, since these govern the computational complexity.

Remark 4.1. Let $A$ be D-LRBG. Each iteration of Algorithm 1 requires at most $4 D N + 8 N + 16 M$ floating point operations and $5 N + 4 M$ assignments.

4.1 Iterates or Averages

The question arises whether or not it is better to estimate with averages or iterates. Numerical testing suggest that the iterates reach tolerance thresholds significantly faster than the averages. We can only give a heuristically explanation for this phenomenon. The stopping criteria of the iterates yields ${lim}_{k \to \infty} A^{T} w^{k} \geq 0$ . In practice we observe that $A^{T} w^{k} \geq 0$ for all sufficiently large k. However, $A^{T} w^{k + 1} \geq 0$ yields $x^{k + 1} \leq x^{k}$ . This monotonicity promotes the converges of the iterates and gives a clue why the iterates seem to converge better in practice. See Figures 5, 6.

4.2 On the Convergence Rate

As stated the NNLS achieves the convergence rate $O (k^{- 2})$ [27] while the NNLAD only achieves the convergence rate of $O (k^{- 1})$ toward to optimal objective value. However, this should not be considered as weaker, since the objective function of the NNLS is the square of a norm. If $x^{k}$ are the iterates of the NNLS implementation of [27], algebraic manipulation yields

{‖ A x^{k} - y ‖}_{2} - {‖ A x^{#} - y ‖}_{2} \leq 2^{1 / 2} {(\frac{1}{2} {| | A x^{k} - y | |}_{2}^{2} - \frac{1}{2} {| | A x^{#} - y | |}_{2}^{2})}^{1 / 2} \leq 2^{1 / 2} {(C k^{- 2})}^{1 / 2} \leq {(2 C)}^{1 / 2} k^{- 1} .

Thus, the $ℓ_{2}$ -norm of the residual of the NNLS iterates only decays in the same order as the $ℓ_{1}$ -norm of the residual of the NNLAD averages.

5 Numerical Experiments and Applications

In the first part of this section we will compare NNLAD with several state of the art recovery methods in terms of achieved sparsity levels and decoding time. For $p \in [1, \infty]$ , we denote $S_{p}^{N - 1} : = {z \in ℝ^{N} : ‖ z ‖_{p} = 1}$ , and $S_{0}^{N - 1} : = {z \in ℝ^{N} : ‖ z ‖_{0} = 1 = {| | z | |}_{2}} = Σ_{1} \cap^{} S_{2}^{N - 1}$ .

5.1 Properties of the Non-Negative Least Absolute Deviation Optimizer

We recall that the goal is to recover $x$ from the noisy linear measurements $y = A x + e$ . To investigate properties of the minimizers of NNLAD we compare it to the minimizers of the well studied problems basis pursuit (BP), optimally tuned basis pursuit denoising (BPDN), optimally tuned $ℓ_{1}$ -constrained least residual (CLR) and the NNLS, which are given by

\begin{array}{l} \underset{z : {‖ A z - y ‖}_{1} \leq ϵ}{argmin} {‖ z ‖}_{1} with ε = {‖ e ‖}_{1} (BPDN), \\ \underset{z : {‖ z ‖}_{1} \leq τ}{argmin} {‖ A z - y ‖}_{1} with τ = {‖ x ‖}_{1} (CLR), \\ \underset{z : A z = y}{argmin} {‖ z ‖}_{1} (BP), \\ \underset{z \geq 0}{argmin} {| | A z - y | |}_{2} (NNLS) . \end{array}

The NNLAD is designed to recover non-negative signals in general and, as we will see, it is able to recover sparse non-negative signals comparable well to the CLR and BPDN with optimal tuning which are particularly designed for this task. Further, we compare the NNLAD to the expander iterative hard thresholding (EIHT). The EIHT is calculated by stopping the following sequence after a suitable stopping criteria is met:

x^{0} : = 0 and x^{k + 1} : = P_{Σ_{S^{'}}} (x^{k} + median (y - A x^{k})) for all k \in ℕ_{0} and with S^{'} = {‖ x ‖}_{0} (EIHT),

where ${median(z)}_{n}$ is the median of ${(z_{m})}_{m \in Row ({n})}$ and $P_{Σ_{S}} (v)$ is a hard thresholding operator, i.e., some minimizer of $\arg \min_{z \in Σ_{S}} 1 / 2 {‖ z - v ‖}_{2}^{2}$ . There is a whole class of thresholding based decoders for lossless expanders, which all need either the sparsity of the signal or the order of the expansion property as tuning parameter. We choose the EIHT as a represent of this class, since the cluster points of its sequence have robust recovery guarantees ([3], Theorem 13.5). By convex decoders we refer to BPDN, BP, CLR, NNLAD, and NNLS. We choose the optimal tuning $ε = {‖ e ‖}_{1}$ for the BPDN and $τ = {| | x | |}_{1}$ for the CLR. The optimally tuned BPDN and CLR are representing a best case benchmark. In ([29], Figure 1.1) it was noticed that tuning the BPDN with $ε > {| | e | |}_{p}$ often leads to worse estimation errors than tuning with $ε < {| | e | |}_{p}$ for $p = 2$ . Thus, BP is a version of BPDN with no prior knowledge about the noise and represents a worst case benchmark. At fist we investigate the properties of the estimators. In order to mitigate effects from different implementations we solve all optimization problems with the CVX package of Matlab [30, 31]. For a given $ℓ_{1} S N R, r, N, M, D, S$ we will do the following experiment multiple times:

Experiment 1

1. Generate a measurement matrix $A \in {0, D^{- 1}}^{M \times N}$ uniformly at random among all D-LRBG.

2. Generate a signal $x$ uniformly at random from $Σ_{S} \cap ℝ_{+}^{N} \cap S_{1}^{N - 1}$ .

3. Generate a noise $e$ uniformly at random from ${‖ A x ‖}_{1} / ℓ_{1} S N R S_{r}^{M - 1}$ .

4. Define the observation $y : = A x + e$ .

5. For each decoder $Q_{A}$ calculate an estimator $x^{#} : = Q_{A} y$ and collect the relative estimation error ${| | x - x^{#} | |}_{1} = {‖ x - x^{#} ‖}_{1} / {‖ x ‖}_{1}$ .

In this experiment we have $ℓ_{1} S N R = {‖ A x ‖}_{1} / {‖ e ‖}_{1}$ and since $A$ is a D-LRBG and $x \geq 0$ , we further have ${‖ A x ‖}_{1} = {| | x | |}_{1} = 1$ . Note that for $r = 0$ and $r = 1$ we obtain two different noise distributions. If $e$ is uniformly distributed on $S_{1}^{M - 1}$ , then the absolute value of each component $| e_{m} |$ is a random variable with density $h \mapsto (M - 1) {(1 - h)}^{M - 2}$ for $h \in [0,1]$ . Thus, $E [{| | e | |}_{2}^{2}] = M 2 / M (M + 1) = 2 / (M + 1)$ . By testing one can observe a concentration around this expected value, in particular that $M^{1 / 2} {‖ e ‖}_{2} \approx \sqrt{2} {| | e | |}_{1}$ with a high probability. If $e$ is uniformly distributed on $S_{0}^{M - 1}$ , then ${‖ e ‖}_{2} = {‖ e ‖}_{1}$ . Thus, these two noise distributions each represent randomly drawn noise vectors obeying one norm equivalence asymptotically tightly up to a constant. From Eqs 1,2 we expect that the NNLS has roughly the same estimation errors as the NNLAD for $r = 1$ , i.e. the evenly distributed noise, and significantly worse estimation errors for $r = 0$ , i.e., the peaky noise.

5.1.1 Quality of the Estimation Error for Varying Sparsity

We fix the constants $r = 1$ , $N = 1024$ , $M = 256$ , $D = 10$ , $ℓ_{1} S N R = 1000$ and vary the sparsity level $S \in [128]$ . For each S we repeat Experiment 1 100 times. We plot the mean of the relative $ℓ_{1}$ -estimation error and the mean of the logarithmic relative $ℓ_{1}$ -estimation error, i.e.,

\begin{array}{l} Mean (N ℓ_{1} E) = Mean (\frac{{‖ x - x^{#} ‖}_{1}}{{‖ x ‖}_{1}}), \\ Mean (LN ℓ_{1} E) = Mean (10 \log_{10} (\frac{{‖ x - x^{#} ‖}_{1}}{{‖ x ‖}_{1}})), \end{array}

over the sparsity. The result can be found in Figures 1A,B.

FIGURE 1

FIGURE 1. Performance of NNLAD for noise with even mass noise and varying sparsity of the signal. (A) NNLAD has almost the same performance as CLR/BPDN. EIHT fails for moderate S. (B) NNLAD and NNLS perform roughly the same.

For $S \geq 30$ the estimation error of the EIHT randomly peaks high. We deduce that the EIHT fails to recover the signal reliably for $S \geq 30$ , while the NNLAD and other convex decoders succeed. This is not surprising, since by ([3], Theorem 13.15) the EIHT obeys a robust recovery guartanee for S-sparse signals, whenever $A$ is the random wak matrix of a $(3 S, D, θ^{'})$ -lossless expander with $θ^{'} < 1 / 12$ . This is significantly stronger than the $(2 S, D, θ)$ -lossless expander property with $θ < 1 / 6$ required for a null space property. It might also be that the null space property is more likely than the lossless expansion property similar to the gap between $ℓ_{2}$ -restricted isometry property and null space property [32]. However, if the EIHT recovers a signal, it recovers it significantly better than any convex method. This might be the case, since the originally generated signal is indeed from $Σ_{S}$ , which is being enforced by the hard thresholding of the EIHT, but not by the convex decoders. This suggests that it might be useful to consider using thresholding on the output of any convex decoder to increase the accuracy if the orignal signal is indeed sparse and not only compressible. For the remainder of this subsection we focus on convex decoders.

Contrary to our expectation the BPDN achieves worse estimation errors than all other convex decoders for $S \geq 60$ , even worse than the BP. The authors have no explanation for this phenomenon. Apart from that we observe that the CLR and BP indeed perform as respectively best and worst case benchmark. However, the difference between BP and CLR becomes rather small for high S. We deduce that tuning becomes less important near the optimal sampling rate.

The NNLAD, NNLS and CLR achieve roughly the same estimation errors. However, note that the BPDN and CLR are optimally tuned using unknown prior information unlike the NNLAD and NNLS. As expected the NNLS performs roughly the same as the NNLAD, see Table 1. However, this is the result of the noise distribution for $r = 1$ . We repeat Experiment 1 with the same constants, but $r = 0$ , i.e., $e$ is a unit vector scaled by $\pm {| | A x | |}_{1} / ℓ_{1} S N R$ . We plot the mean of the relative $ℓ_{1}$ -estimation error and the mean of the logarithmic relative $ℓ_{1}$ -estimation error over the sparsity. The result can be found in Figures 2A,B.

FIGURE 2

FIGURE 2. Performance of NNLAD for noise with peaky mass and varying sparsity of the signal. (A) The NNLS does not fail, but performs bad. (B) The NNLS and NNLAD differ strongly.

We want to note that similarly to Figure 1A the EIHT works only unreliably for $S \geq 30$ . Even though the mean of the logarithmic relative $ℓ_{1}$ -estimation error of NNLS is worse than the one of EIHT for $30 \leq S \leq 60$ , the NNLS does not fail but only approximates with a weak error bound. As the theory suggests, the NNLS performs significantly worse than the NNLAD, see Table 1. It is worth to mention, that the estimaton errors of NNLS seem to be bounded by the estimation errors of BP. This suggests that $A$ obeys a $ℓ_{1}$ quotient property, that bounds the estimation error of any instance optimal decoder, see ([3], Lemma 11.15).

5.1.2 Noise-Blindness

Theorem 3.4 states that the NNLAD has an error bound similarly to the optimally tuned CLR and BPDN. Further, by Eq. 1 the ratio

\frac{{| | x - x^{#} | |}_{1}}{{| | e | |}_{1} {| | x | |}_{1}} = \frac{{| | x - x^{#} | |}_{1}}{{| | e | |}_{1}}

should be bounded by some constant. To verify this, we fix the constants $r = 1$ , $N = 1024$ , $M = 256$ , $D = 10$ , $S = 32$ and vary the signal to noise ratio $ℓ_{1} S N R \in 10 [100]$ . For each $ℓ_{1} S N R$ we repeat Experiment 1 100 times. We plot the mean of the logarithmic relative $ℓ_{1}$ -estimation error and the mean of the ratio of relative $ℓ_{1}$ -estimation error and $ℓ_{1}$ -noise power, i.e.

\begin{array}{l} Mean (LN ℓ_{1} E) = Mean (10 \log_{10} (\frac{{‖ x - x^{#} ‖}_{1}}{{| | x | |}_{1}})), \\ Mean (\frac{N ℓ_{1} E}{ℓ_{1} NP}) = Mean (\frac{{‖ x - x^{#} ‖}_{1}}{{| | e | |}_{1} {| | x | |}_{1}}), \end{array}

over the sparsity. The result can be found in Figures 3A,B .

FIGURE 3

FIGURE 3. Performance of NNLAD for noise with even mass and varying noise power. (A) The NNLAD and NNLS recover reliably for all signal to noise ratios. (B) The estimation error scales linearly with the noise power.

The logarithmic relative $ℓ_{1}$ -estimation errors of the different decoders stay in a constant relation to each other over the whole range of $ℓ_{1} S N R$ . This relation is roughly the relation we can find in Figure 1B for $S = 32$ . As expected the the ratio of relative $ℓ_{1}$ -estimation error and $ℓ_{1}$ -noise power stays constant independent on the $ℓ_{1} S N R$ for all decoders. We deduce that the NNLAD is noise-blind. We repeat the experiment with $r = 0$ and obtain Figures 4A,B.

FIGURE 4

FIGURE 4. Performance of NNLAD for noise with peaky mass and varying noise power. (A) The NNLAD outperforms the NNLS. (B) The estimation error does not scale linearly with the noise power.

Note that ${‖ x - x^{#} ‖}_{1} / {‖ x ‖}_{1}$ and not ${‖ x - x^{#} ‖}_{1} / ({‖ x ‖}_{1} {‖ e ‖}_{1})$ seems to be constant. Since ${‖ x - x^{#} ‖}_{1} / {‖ x ‖}_{1} \approx {1.0⋅10}^{- 7}$ is fairly small, we suspect that this is the result of CVX reaching a tolerance parameter² $\sqrt{e p s} \approx 1.5 \cdot 10^{- 8}$ and terminating, while the actual optimizer might in fact be the original signal. It is remarkable that even with the incredibly small signal to noise ratio of 10 the signal can be recovered by the NNLAD with an estimation error of $1.0 \cdot 10^{- 7}$ for this noise distribution.

5.2 Decoding Complexity

5.2.1 Non-Negative Least Absolute Deviation Vs Iterative Methods

To investigate the convergence rates of the NNLAD as proposed in 4, we compare it to different types of decoders when $e = 0$ . There are some sublinear time recovery methods for lossless expander matrices including ([3 Section 13.4, 5]). These are, as the name suggests, significantly faster than the NNLAD. These, as several other greedy methods ([3 Section 13.3, 5, 18, 19, 21]), rely on a strong lossless expansion property. As a representative of all greedy and sublinear time methods we will consider the EIHT, which has a linear convergence rate $O (c^{- k})$ toward the signal and robust recovery guarantees ([3], Theorem 13.15). The EIHT also represents a best case benchmark. As a direct competitor we consider the NNLS implemented by the methods of [27] ³, which has a convergence rate of $O (k^{- 2})$ toward the optimal objective value. [27] can also be used to calculate the CLR if the residual norm is the $ℓ_{2}$ -norm. However, calculating the projection onto the $ℓ_{1}$ -ball in $ℝ^{N}$ , is computationally slightly more complex than the projection onto $ℝ_{+}^{N}$ . Thus, the CLR will be solved slightly slower than the NNLS with [27]. Note that cross-validation techniques would need to solve multiple optimization problems of a similar complexity as the NNLS to estimate a signal. As a consequence such methods have a multiple times higher complexity than the NNLS and are not considered here. As a worst case benchmark we consider a simple projected subgradient implementation of NNLAD using the Polyak step size, i.e.

x^{k + 1} : = P_{ℝ_{+}^{N}} (x^{k} - \frac{{| | A x^{k} - y | |}_{1}}{{| | A^{T} sgn (A x^{k} - y) | |}_{2}^{2}} A^{T} sgn (A x^{k} - y)), (NNLAD Subgrad)

which has a convergence rate of $O (k^{- 1 / 2})$ toward the optimal objective value ([33], Section 7.2.2 & Section 5.3.2). We initialized all iterated methods by 0. The EIHT will always use the parameter $S^{'} = {| | x | |}_{0}$ , the NNLAD $σ_{1} = σ_{2} = 0.99 {‖ A ‖}_{2 \to 2}^{- 1}$ and the NNLS the parameters $s = 0.99 {‖ A ‖}_{2 \to 2}^{- 2}$ and $α = 3.01$ , see [27]. Just like the BPDN and CLR, the EIHT needs an oracle to get some unknown prior information, in this case ${‖ x ‖}_{0}$ . Parameters that can be computed from $A$ , will be calculated before the timers start. This includes the adjacency structure of $A$ for the EIHT, $σ_{1}$ , $σ_{2}$ for NNLAD, s, α for NNLS, since these are considered to be a part of the decoder. We will do the following experiment multiple times:

Experiment 2

1. If $r = 1$ , generate a measurement matrix $A \in {0, D^{- 1}}^{M \times N}$ uniformly at random among all D-LRBG. If $r = 2$ , draw each component $A_{m, n}$ of the measurement matrix independent and uniformly at random from ${0,1}$ , i.e., as $0 / 1$ Bernoulli random variables.

2. Generate a signal $x$ uniformly at random from $Σ_{S} \cap ℝ_{+}^{N} \cap S_{r}^{N - 1}$ .

3. Define the observation $y : = A x$ .

4. For each iterative method calculate the sequence of estimators $x^{k}$ for all $k \leq 20000$ and collect the relative estimation errors ${| | x^{k} - x | |}_{1} / {‖ x ‖}_{1}$ , the relative norms of the residuals ${‖ A x^{k} - y ‖}_{1} / {| | y | |}_{1}$ and the time to calculate the first k iterations.

For $r = 2$ this represents a biased sub-Gaussian random ensemble [13] with optimal recovery guarantees for the NNLS.For $r = 1$ this represents a D-LRBG random ensemble with optimal recovery guarantees for the NNLAD. We fix the constants $r = 1$ , $N = 1024$ , $M = 256$ , $S = 16$ , $D = 10$ and repeat 2 100 times. We plot the mean of the logarithmic relative $ℓ_{1}$ -estimation error and the mean of the relative $ℓ_{1}$ -norm of the residual, i.e.

\begin{array}{l} Mean (LN ℓ_{1} E) = Mean (10 \log_{10} (\frac{{| | x^{k} - x | |}_{1}}{{| | x | |}_{1}})), \\ Mean (LN ℓ_{1} R) = Mean (10 \log_{10} (\frac{{‖ A x^{k} - y ‖}_{1}}{{| | y | |}_{1}})), \end{array} (7)

over the sparsity and the time. The result can be found in Figures 5, 6.

FIGURE 5

FIGURE 5. Convergence rates of certain iterated methods with respect to the number of iterations.

FIGURE 6

FIGURE 6. Convergence rates of certain iterated methods with respect to the time.

The averages of NNLAD converge significantly slower than the iterates, even though we lack a convergence rate for the iterates. We deduce that one should always use the iterates of NNLAD to recover a signal. Surprisingly, the averages converge even slower than the subgradient method. However, this is not because the averages converge slow, but rather because the subgradient method and all others converges faster than expected. In particular, the NNLAD iterates, EIHT and the NNLS all converge linearly toward the signal. Further, their correspoding objective values also converge linearly toward the optimal objective value. Even the subgradient method converges almost linearly. We deduce that the NNLS is the fastest of these methods if $A$ is a D-LRBG.

Apart from a constant the NNLAD iterates, EIHT and NNLS converge in the same order. However, this behavior does not hold if we consider a different distribution for $A$ as one can verify by setting each component $A_{m, n}$ as independent $0 / 1$ Bernoulli random variables. While EIHT has better iterations compared to the NNLS, it still takes more time to achieve the same estimation errors and residuals. We plot the mean of the time required to calculate the first k iterations in Figure 7.

FIGURE 7

FIGURE 7. Time required to perform iterations of certain iterated methods.

The EIHT requires roughly 6 times as long as any other method to calculate each iteration. All methods but the EIHT can be implemented with only two matrix vector multiplications, namely once by $A$ and once by $A^{T}$ . Both of these requires roughly $2 D N$ floating point operations. Hence, each iteration requires $O (4 D N)$ floating point operations. The EIHT only calculates one matrix vector multiplication, but also the median. This calculation is significantly slower than a matrix vector multiplication. For every $n \in [N]$ we need to order a vector with D elements, which can be performed in $O (D l o g D)$ . Hence, each iteration of EIHT requires $O (D N l o g D)$ floating point operations, which explains why the EIHT requires significantly more time for each iteration.

As we have seen the NNLS is able to recover signals faster than any other method, however it also only obeys sub-optimal robustness guarantees for uniformly at random chosen D-LRBG as we have seen in Figure 4A. We ask ourself whether or not the NNLS is also faster with a more natural measurement scheme, i.e., if $A_{m, n}$ are independent $0 / 1$ Bernoulli random variables. We repeat 2 100 times with $r = 2$ for the NNLS and $r = 1$ for the other methods. We again plot the mean of the logarithmic relative $ℓ_{1}$ -estimation error and the mean of the relative $ℓ_{1}$ -norm of the residual in Figures 8, 9.

FIGURE 8

FIGURE 8. Convergence rates of certain iterated methods with respect to the number of iterations. A is Bernoulli for NNLS and D-LRBG for the others.

FIGURE 9

FIGURE 9. Convergence rates of certain iterated methods with respect to the time. A is Bernoulli for NNLS and D-LRBG for the others.

The NNLAD and the EIHT converge to the solution with roughly the same time. Even the subgradient implementation of the NNLAD recovers a signal in less time than the NNLS. Further the convergence of NNLS does not seem to be linear anymore. We deduce that sparse structure of $A$ has a more significant influence on the decoding time than the smoothness of the data fidelity term. Also we deduce that even the subgradient method is a viable choice to recover a signal.

5.2.2 Non-Negative Least Absolute Deviation Vs SPGL1

As a last test we compare the NNLAD to the SPGL1 [16, 34] toolbox for matlab.

Experiment 3

1. Draw the measurement matrix $A \in {0, D^{- 1}}^{M \times N}$ uniformly at random among all D-LRBG.

2. Generate the signal $x$ uniformly at random from $Σ_{S} \cap ℝ_{+}^{N} \cap S_{r}^{N - 1}$ .

3. Define the observation $y : = A x$ .

4. Use a benchmark decoder to calculate an estimator $x^{#}$ and collect the relative estimation errors ${| | x^{#} - x | |}_{1} / {‖ x ‖}_{1}, {| | x^{#} - x | |}_{2} / {‖ x ‖}_{2}$ and the time to calculate $x^{#}$ .

5. For each iterative method calculate iterations until ${‖ x^{k} - x ‖}_{1} / {‖ x ‖}_{1} \leq {‖ x^{#} - x ‖}_{1} / {| | x | |}_{1}$ and ${‖ x^{k} - x ‖}_{2} / {‖ x ‖}_{2} \leq {‖ x^{#} - x ‖}_{2} / {| | x | |}_{2}$ . Collect the time to perform these iterations. If this threshold can not be reached after $10^{5}$ iterations, the recovery failed and the time is set to $\infty$ .

We again fix the dimension $N = 1024$ , $M = 256$ , $D = 10$ and vary $S \in [128]$ . For both the BP implementation of SPGL1 and the CLR implementation of SPGL1 we repeat Experiment 3 100 times for each S. We plot the mean of the time to calculate the estimators and plot these over the sparsity in Figures 10A,B.

FIGURE 10

FIGURE 10. Time of the NNLAD and NNLS to approximate better than SPGL methods. (A) The NNLAD is faster than the BP of SPGL1 for high S. (B). The NNLAD is faster than the CLR of SPGL1 for moderate S.

The NNLAD implementation is slower than both SPGL1 methods for small S. However, if we have the optimal number of measurements $M \in O (S l o g N / S)$ , the NNLAD is faster than both SPGL1 methods.

5.2.3 Summary

The implementation of NNLAD as presented in Algorithm 1 is a reliable recovery method for sparse non-negative signals. There are methods that might be faster, but these either recover a smaller number of coefficients (EIHT, greedy methods) or they obey sub-optimal recovery guarantees (NNLS). The implementation is as fast as the commonly uses SPGL1 toolbox, but has the advantage that it requires no tuning depending on the unknown $x$ or $e$ . Lastly, the NNLAD can handle peaky noise overwhelmingly good.

5.3 Application for Viral Detection

With the outbreak and rapid spread of the COVID-19 virus we need to test a large amount of people for an infection. Since we can only test a fixed number of persons in a given time, the number of persons tested for the virus grows at most linearly. On the other hand, models suggest that the number of possibly infected persons grows exponentially. At some point, if that is not already the case, we will have a shortage of test kits and we will not be able to test every person. It is thus desirable to test as much persons with as few as possible test kits.

The field group testing develops strategies to test groups of individuals instead of individuals in order to reduce the amount of tests required to identify infected individuals. The first advances in group testing were made in [35]. For a general overview about group testing we refer to [36].

The problem of testing a large group for a virus can be modeled as a compressed sensing problem in the following way: Suppose we want to test N persons, labeled by $[N] = {1, \dots, N}$ , to check whether or not they are affected by a virus. We denote by $x_{n}$ the quantity of viruses in the specimen of the nth person. Suppose we have M test kits, labeled by $[M] = {1, \dots, M}$ . By $y_{m}$ we denote the amount of viruses in the sample of the mth test kit. Let $A \in {[0,1]}^{M \times N}$ . For every n we put a fraction of size $A_{m, n}$ of the specimen of the nth person into the sample for the mth test kit. The sample of the mth test kit will then have the quantity of viruses

\sum_{n \in [N]} A_{m, n} x_{n} + e_{m}^{c o n},

where $e_{m}^{c o n}$ is the amount of viruses in the sample originating from a possible contamination of the sample. A quantitative reverse transcription polymerase chain reaction estimates the quantity of viruses by $y_{m}$ with a small error $e_{m}^{p c r} = y_{m} - \sum_{n \in [N]} A_{m, n} x_{n} - e_{m}^{c o n}$ . After all M tests we detect the quantity

y = A x + e, (8)

where $e = e^{c o n} + e^{p c r}$ . Since contamination of samples happens rarely, $e^{c o n}$ is assumed to be peaky in terms of Table 1, while $e^{p c r}$ is assumed to have even mass but a small norm. In total $e$ is peaky.

Often each specimen is tested separately, meaning that $A$ is the identity. In particular, we need at least as much test kits as specimens. Further, we estimate the true quantity of viruses $x_{n}$ by $x_{n}^{#} : = y_{n}$ , which results in the estimation error $x_{n}^{#} - x_{n} = e_{n} = e_{n}^{c o n} + e_{n}^{p c r}$ . Since the noise vector $e$ is peaky, some but few tests will be inaccurate and might result in false positives or false negatives.

In general, only a fraction of persons is indeed affected by the virus. Thus, we assume that ${‖ x ‖}_{0} \leq S$ for some small S. Since the amount of viruses is a non-negative value, we also have $x \geq 0$ . Hence, we can use the NNLR to estimate $x$ and in particular we should use the NNLAD due to the noise being peaky. Corollary 3.6 suggests to choose $A$ as the random walk matrix of a lossless expander or by ([3], Theorem 13.7) to choose $A$ as a uniformly at random chosen D-LRBG. Such a matrix $A$ has non-negative entries and the column sums of $A$ are not greater than one. This is a necessary requirement since each column sum is the total amount of specimen used in the test procedure. Especially, a fraction of $D^{- 1}$ of each specimen is used in exactly D test kits.

By Corollary 3.6 and [ [3], Theorem 13.7] this allows us to reduce the number of test kits required to $M \approx C S \log e N / S$ . As we have seen in Figures 4A,B we expect the NNLAD estimator to correct the errors from $e^{c o n}$ and the estimation error to be in the order of ${| | e^{p c r} | |}_{1}$ which is assumed to be small. Hence, the NNLAD estimator with a random walk matrix of a lossless expander might even result in less false positives and false negatives than individual testing.

Note that the lack of knowledge about the noise $e$ favors the NNLAD recovery method over a (BPDN) approach. Further, since the total sum of viruses in all patients given by $\sum_{n \in [N]} x_{n} = {‖ x ‖}_{1}$ is unknown, it is undesirable to use (CLR).

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author Contributions

BB and PJ proposed the research problem. HP derived the results of the paper, with feedback from BB and PJ. HP wrote the paper also with feedback from BB and PJ.

Funding

The work was partially supported by DAAD grant 57417688. PJ has been supported by DFG grant JU 2795/3. BB has been supported by BMBF through the German Research Chair at AIMS, administered by the Humboldt Foundation. We acknowledge support by the German Research Foundation and the Open Access Publication Funds of TU Berlin. This article has appeared as a preprint (27), see https://arxiv.org/abs/2003.13092.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Footnotes

¹The $ℓ_{1}$ -constrained least residual is given by $\underset{{‖ z ‖}_{1} \leq τ}{argmin} | | A z - y | |$ for some norm $‖ \cdot ‖$ .

²The tolerance parameters of CVX are the second and fourth root of the machine precision by default [30, 31].

³This was the fastest method found by the authors. Other possibilities would be [15, Algorithm 2], [26].

References

1. Candes, EJ, Romberg, J, and Tao, T. Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information. IEEE Trans Inf Theory (2006). 52:489–509. doi:10.1109/TIT.2005.862083

CrossRef Full Text | Google Scholar

2. Donoho, DL. Compressed Sensing. IEEE Trans Inform Theor (2006). 52:1289–306. doi:10.1109/TIT.2006.871582

CrossRef Full Text | Google Scholar

3. Foucart, S, and Rauhut, H. A Mathematical Introduction to Compressive Sensing. Basel, Switzerland: Birkhäuser (2013).

4. Hastie, T, Tibshirani, R, and Wainwright, M. Statistical Learning with Sparsity: The Lasso and Generalizations. New York, NY: Chapman and Hall/CRC (2015).

5. Donoho, DL, and Tanner, J. Sparse Nonnegative Solution of Underdetermined Linear Equations by Linear Programming. Proc Natl Acad Sci (2005). 102:9446–51. doi:10.1073/pnas.0502269102

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Gilbert, A, and Indyk, P. Sparse Recovery Using Sparse Matrices. Proc IEEE (2010). 98:937–47. doi:10.1109/JPROC.2010.2045092

CrossRef Full Text | Google Scholar

7. Bruckstein, AM, Elad, M, and Zibulevsky, M. On the Uniqueness of Non-negative Sparse & Redundant Representations. Proc IEEE Int Conf Acoust Speech Signal Process (2008). 5145–8. doi:10.1109/ICASSP.2008.4518817

CrossRef Full Text | Google Scholar

8. Donoho, DL, and Tanner, J. Counting the Faces of Randomly-Projected Hypercubes and Orthants, with Applications. Discrete Comput Geom (2010). 43:522–41. doi:10.1007/s00454-009-9221-z

CrossRef Full Text | Google Scholar

9. Wang, M, Xu, W, and Tang, A. A Unique “Nonnegative” Solution to an Underdetermined System: From Vectors to Matrices. IEEE Trans Signal Process (2011). 59:1007–1016. doi:10.1109/TSP.2010.2089624

CrossRef Full Text | Google Scholar

10. Slawski, M, and Hein, M. Sparse Recovery by Thresholded Non-negative Least Squares. Adv Neural Inf Process Syst (2011). 24:1926–1934.

Google Scholar

11. Slawski, M, and Hein, M. Non-negative Least Squares for High-Dimensional Linear Models: Consistency and Sparse Recovery without Regularization. Electron J Stat (2013). 7:3004–3056. doi:10.1214/13-EJS868

CrossRef Full Text | Google Scholar

12. Kabanava, M, Kueng, R, Rauhut, H, and Terstiege, U. Stable Low-Rank Matrix Recovery via Null Space Properties. Inf Inference (2016). 5:405–441. doi:10.1093/imaiai/iaw014

CrossRef Full Text | Google Scholar

13. Kueng, R, and Jung, P. Robust Nonnegative Sparse Recovery and the Nullspace Property of 0/1 Measurements. IEEE Trans Inf Theory (2018). 64:689–703. doi:10.1109/TIT.2017.2746620

CrossRef Full Text | Google Scholar

14. Shadmi, Y, Jung, P, and Caire, G. Sparse Non-negative Recovery from Biased Subgaussian Measurements Using NNLS. In: IEEE International Symposium on Information Theory; 2019 July 7–12; Paris, France (2019).

CrossRef Full Text | Google Scholar

15. Chambolle, A, and Pock, T. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging. J Math Imaging Vis (2011). 40:120–45. doi:10.1007/s10851-010-0251-1

CrossRef Full Text | Google Scholar

16. Van Den Berg, E, and Friedlander, MP. Probing the Pareto Frontier for Basis Pursuit Solutions. SIAM J Sci Comput (2009). 31:890–912. doi:10.1137/080714488

CrossRef Full Text | Google Scholar

17. Bühlmann, P, and Van De Geer, S. Statistics for High-Dimensional Data. Berlin, Heidelberg: Springer (2011).

18. Jafarpour, S, Xu, W, Hassibi, B, and Calderbank, R. Efficient and Robust Compressed Sensing Using Optimized Expander Graphs. IEEE Trans Inf Theory (2009). 55:4299–4308. doi:10.1109/TIT.2009.2025528

CrossRef Full Text | Google Scholar

19. Xu, W, and Hassibi, B. Efficient Compressive Sensing with Deterministic Guarantees Using Expander Graphs. In: IEEE Information Theory Workshop; 2007 September 2–6; Tahoe City, CA, United States (2007). 414–9.

Google Scholar

20. Berinde, R, Gilbert, AC, Indyk, P, Karloff, H, and Strauss, MJ. Combining Geometry and Combinatorics: A Unified Approach to Sparse Signal Recovery. In: 46th Annual Allerton Conference on Communication; 2008 September 23–26; Monticello, IL, United States (2008). 798–805.

Google Scholar

21. Khajehnejad, MA, Dimakis, AG, Xu, W, and Hassibi, B. Sparse Recovery of Nonnegative Signals with Minimal Expansion. IEEE Trans Signal Process (2011). 59:196–208. doi:10.1109/TSP.2010.2082536

CrossRef Full Text | Google Scholar

22. Morgenshtern, VI, and Candès, EJ. Super-resolution of Positive Sources: The Discrete Setup. SIAM J Imaging Sci (2016). 9:412–44. doi:10.1137/15M1016552

CrossRef Full Text | Google Scholar

23. Jaensch, F, and Jung, P. Robust Recovery of Sparse Nonnegative Weights from Mixtures of Positive-Semidefinite Matrices(Preprint). (2020). Available from: https://arxiv.org/abs/2003.12005.

24. Vadhan, SP. Pseudorandomness. FNT Theor Comp Sci (2012). 7:1–336. doi:10.1561/0400000010

CrossRef Full Text | Google Scholar

25. Nesterov, Y Introductory Lectures on Convex Optimization - A Basic CourseApplied Optimization. New York, NY: Springer (2004).

CrossRef Full Text

26. Beck, A, and Teboulle, M. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM J Imaging Sci (2009). 2:183–202. doi:10.1137/080716542

CrossRef Full Text | Google Scholar

27. Attouch, H, and Peypouquet, J. The Rate of Convergence of Nesterov's Accelerated Forward-Backward Method Is Actually Faster Than $1/k^2$. SIAM J Optim (2016). 26:1824–1834. doi:10.1137/15M1046095

CrossRef Full Text | Google Scholar

28. Petersen, HB, Bah, B, and Jung, P. Efficient Tuning-free L1-Regression of Nonnegative Compressible Signals. (2020). Available from: https://arxiv.org/abs/2003.13092. (Accessed May 10, 2021).

Google Scholar

29. Kümmerle, C. Understanding and Enhancing Data Recovery Algorithms- From Noise-Blind Sparse Recovery to Reweighted Methods for Low-Rank Matrix Optimization. [PhD dissertation]. München (Germany): Technical University of Munich (2019).

30. Grant, M, and Boyd, S. CVX: Matlab Software for Disciplined Convex Programming (2014).

31. Grant, MC, and BOYd, SP. Graph Implementations for Nonsmooth Convex Programs. In: V Blondel, S Boyd, and H Kimura, editors. Recent Advances in Learning and Control. Lecture Notes in Control and Information Sciences. New York, NY: Springer-Verlag Limited (2008). p. 95–110.

Google Scholar

32. Dirksen, S, Lecué, G, and Rauhut, H. On the Gap between Restricted Isometry Properties and Sparse Recovery Conditions. IEEE Trans Inf TheorY (2018). 64:5478–87. doi:10.1109/TIT.2016.2570244

CrossRef Full Text | Google Scholar

33. Polyak, BT. Introduction to Optimization (Translations Series in Mathematics and Engineering). New York, NY: Optimization Software, Inc (1987).

34. Van Den Berg, E, and Friedlander, MP. SPGL1: A Solver for Large-Scale Sparse Reconstruction (2019).

35. Dorfman, R. The Detection of Defective Members of Large Populations. Ann Math Statist 14 (1943). 436–40. doi:10.1214/aoms/1177731363

CrossRef Full Text | Google Scholar

36. Aldridge, M, Johnson, O, and Scarlett, J. Group Testing: An Information Theory Perspective. FNT Commun Inf Theory (2019). 15:196–392. doi:10.1561/0100000099

CrossRef Full Text | Google Scholar

37. Petersen, HB, and Jung, P. Robust Instance-Optimal Recovery of Sparse Signals at Unknown Noise Levels(Preprint) (2020). Available from: https://arxiv.org/abs/2008.08385.

6 Appendix

6.1 Proof of Non-Negative Least Residual Recovery Guarantee

By $1$ we denote the all ones vector in $ℝ^{N}$ or $ℝ^{M}$ respectively. The proof is an adaption of the steps used in [13]. As for most convex optimization problems in compressed sensing we use ([3], Theorem 4.25) and [ [3], Theorem 4.20] respectively, which require $A$ to have the RNSP.

Theorem 6.1 ([3], Theorem 4.25) and ([3], Theorem 4.20)). Let $q \in [1, \infty)$ and suppose $A$ has the $ℓ_{q}$ -RNSP of order S with respect to $‖ \cdot ‖$ with constants ρ and τ. Then, it holds that

{‖ x - z ‖}_{q} \leq \frac{{(1 + ρ)}^{2}}{1 - ρ} S^{1 / q - 1} ({‖ z ‖}_{1} - {‖ x ‖}_{1} + 2 d_{1} (x, Σ_{S})) + \frac{3 + ρ}{1 - ρ} τ ‖ A (x - z) ‖ for all x, z \in ℝ^{n} .

If $q = 1$ , this bound can be improved to

{‖ x - z ‖}_{1} \leq \frac{1 + ρ}{1 - ρ} ({‖ z ‖}_{1} - {| | x | |}_{1} + 2 d_{1} (x, Σ_{S})) + \frac{2}{1 - ρ} τ ‖ A (x - z) ‖ for all x, z \in ℝ^{n} .

Note that by a modification of the proof this result also holds for $q = \infty$ . The modifications on the proofs of ([3], Theorem 4.25) and ([3], Theorem 4.20) are straight forward, only the modification of ([3], Theorem 2.5) might not be obvious. See also [37]. As a consequence, all our statements also hold for $q = \infty$ with $1 / q : = 0$ . If $W \in ℝ^{N \times N}$ is a diagonal matrix, we can calculate some operator norms fairly easy:

{‖ W ‖}_{q \to q} : = sup_{{| | w | |}_{q} \leq 1} {| | W w | |}_{q} = max_{n \in [N]} | W_{n, n} | for all q \in [1, \infty] .

We use this relation at several places throughout this section. Furthermore, we use ([13], Lemma 5) without adaption. For the sake of completeness we add a short proof.

Lemma 6.2 ([13], Lemma 5). Let $q \in [1, \infty)$ and suppose that $A \in ℝ^{M \times N}$ has $ℓ_{q}$ -RNSP of order S with respect to $∥ \cdot ∥$ with constants ρ and τ. Let $W \in ℝ^{N \times N}$ be a diagonal matrix with $W_{n, n} > 0$ . If $ρ^{'} = {‖ W ‖}_{q \to q} {‖ W^{- 1} ‖}_{1 \to 1} ρ < 1$ , then $A W^{- 1}$ has $ℓ_{q}$ -RNSP of order S with respect to $∥ \cdot ∥$ with constants $ρ^{'} = {| | W | |}_{q \to q} {‖ W^{- 1} ‖}_{p \to p} ρ$ and $τ^{'} = {| | W | |}_{q \to q} τ$ .

Proof Let $v \in ℝ^{N}$ and $# (T) \leq S$ . If we apply the RNSP of $A$ for the vector $(W^{- 1} v) |_{T}$ , we get

\begin{array}{l} {| | | v |_{T} | |}_{q} = {‖ W W^{- 1} (| v |_{T}) ‖}_{q} \leq {‖ W ‖}_{q \to q} {‖ W^{- 1} (| v |_{T}) ‖}_{q} = {| | W | |}_{q \to q} {‖ (W^{- 1} v) |_{T} ‖}_{q} \\ \leq {‖ W ‖}_{q \to q} (ρ S^{\frac{1}{q} - 1} {| | W^{- 1} (v |_{T^{c}}) | |}_{1} + τ ‖ A W^{- 1} v ‖) \\ = {| | W | |}_{q \to q} ρ S^{\frac{1}{q} - 1} {| | W^{- 1} (v |_{T^{c}}) | |}_{1} + {| | W | |}_{q \to q} τ | | A W^{- 1} v | | \\ \leq {‖ W ‖}_{q \to q} {‖ W^{- 1} ‖}_{1 \to 1} ρ S^{\frac{1}{q} - 1} {‖ v |_{T^{c}} ‖}_{1} + {‖ W ‖}_{q \to q} τ | | A W^{- 1} v | | . \end{array}

This finishes the proof.

Next we adapt ([13], Theorem 4) to account for arbitrary norms. Further, we obtain a slight improvement in form of the dimensional scaling constant $S^{1 / q - 1}$ . With this, our error bound becomes for $S \to \infty$ asymptotically the error bound of the basis pursuit denoising, whenever $κ = 1$ and $q > 1$ [3].

Proposition 6.3 (Similar to ([13], Theorem 4)). Let $q \in [1, \infty)$ and $∥ \cdot ∥$ be a norm on $ℝ^{M}$ with dual norm $∥ \cdot ∥_{*}$ . Suppose $A$ has $ℓ_{q}$ -RNSP of order S with respect to $∥ \cdot ∥$ with constants ρ and τ. Suppose $A$ has the $M^{+}$ criterion with vector $t$ and constant κ and that $κ ρ < 1$ . Then, we have

{‖ x - z ‖}_{q} \leq 2 \frac{{(1 + κ ρ)}^{2}}{1 - κ ρ} κ S^{1 / q - 1} d_{1} (x, Σ_{S}) + (\frac{{(1 + κ ρ)}^{2}}{1 - κ ρ} S^{1 / q - 1} max_{n \in [N]} | {(A^{T} t)}_{n}^{- 1} | {‖ t ‖}_{*} + \frac{3 + κ ρ}{1 - κ ρ} κ τ) | | A z - A x | | for all x, z \in ℝ_{+}^{N} .

If $q = 1$ , this bound can be improved to

{| | x - z | |}_{q} \leq 2 \frac{1 + κ ρ}{1 - κ ρ} κ d_{1} (x, Σ_{S}) + (\frac{1 + κ ρ}{1 - κ ρ} max_{n \in [N]} | {(A^{T} t)}_{n}^{- 1} | {| | t | |}_{*} + \frac{2}{1 - κ ρ} κ τ) \times | | A z - A x | | for all x, z \in ℝ_{+}^{N} .

Proof Let $x, z \geq 0$ . In order to apply Lemma 6.2 we set $W$ as the matrix with diagonal $A^{T} t$ and zero else. It follows that $W_{n, n} > 0$ and ${‖ W ‖}_{q \to q} {‖ W^{- 1} ‖}_{1 \to 1} ρ = κ ρ < 1$ . We can apply Lemma 6.2, which yields that $A W^{- 1}$ has $ℓ_{q}$ -RNSP with constants $ρ^{'} = {‖ W ‖}_{q \to q} {| | W^{- 1} | |}_{1 \to 1} ρ = κ ρ$ and $τ^{'} = {‖ W ‖}_{q \to q} τ = {max}_{n \in [N]} | {(A^{T} t)}_{n} | τ$ . We apply Theorem 6.1 with the matrix $A W^{- 1}$ , the vectors $W x$ , $W z$ and the constants $ρ^{'}$ and $τ^{'}$ and get

\begin{array}{l} {| | W x - W z | |}_{q} \leq \frac{{(1 + ρ^{'})}^{2}}{1 - ρ^{'}} S^{1 / q - 1} ({| | W z | |}_{1} - {‖ W x ‖}_{1} + 2 d_{1} (W x, Σ_{S})) + \frac{3 + ρ^{'}}{1 - ρ^{'}} τ^{'} | | A W^{- 1} (W x - W z) | | \\ \leq \frac{{(1 + ρ^{'})}^{2}}{1 - ρ^{'}} S^{1 / q - 1} ({| | W z | |}_{1} - {| | W x | |}_{1} + 2 {‖ W ‖}_{1 \to 1} d_{1} (x, Σ_{S})) + \frac{3 + ρ^{'}}{1 - ρ^{'}} τ^{'} | | A x - A z | | \\ = 2 \frac{{(1 + κ ρ)}^{2}}{1 - κ ρ} max_{n \in [N]} | {(A^{T} t)}_{n} | S^{1 / q - 1} d_{1} (x, Σ_{S}) + \frac{{(1 + κ ρ)}^{2}}{1 - κ ρ} S^{1 / q - 1} ({‖ W z ‖}_{1} - {| | W x | |}_{1}) + \frac{3 + κ ρ}{1 - κ ρ} max_{n \in [N]} | {(A^{T} t)}_{n} | τ ‖ A x - A z ‖ . \end{array}

We lower bound the left hand side further to get

\begin{array}{l} {‖ x - z ‖}_{q} \leq {| | W^{- 1} | |}_{q \to q} {‖ W x - W z ‖}_{q} = max_{n \in [N]} | {(A^{T} t)}_{n}^{- 1} | {| | W x - W z | |}_{q} \\ \leq 2 \frac{{(1 + κ ρ)}^{2}}{1 - κ ρ} κ S^{1 / q - 1} d_{1} (x, Σ_{S}) + \frac{{(1 + κ ρ)}^{2}}{1 - κ ρ} S^{1 / q - 1} max_{n \in [N]} | {(A^{T} t)}_{n}^{- 1} | ({| | W x | |}_{1} - {‖ W z ‖}_{1}) + \frac{3 + κ ρ}{1 - κ ρ} κ τ ‖ A x - A z ‖ . \end{array} (5)

We want to estimate the term ${‖ W x ‖}_{1} - {‖ W z ‖}_{1}$ using the $M^{+}$ criterion. Since $z, x \geq 0$ , $W_{n, n} = {(A^{T} t)}_{n} > 0$ and $W$ is a diagonal matrix, we have

\begin{array}{l} {| | W z | |}_{1} - {| | W x | |}_{1} = 〈 1, W z 〉 - 〈 1, W x 〉 = 〈 W^{T} 1, z - x 〉 = 〈 W 1, z - x 〉 \\ = 〈 t, A (z - x) 〉 \leq {‖ t ‖}_{*} ‖ A z - A x ‖ . \end{array}

Applying this to Eq. 5 we get

{| | x - z | |}_{q} \leq 2 \frac{{(1 + κ ρ)}^{2}}{1 - κ ρ} κ S^{1 / q - 1} d_{1} (x, Σ_{S}) + (\frac{{(1 + κ ρ)}^{2}}{1 - κ ρ} S^{1 / q - 1} max_{n \in [N]} | {(A^{T} t)}_{n}^{- 1} | {‖ t ‖}_{*} + \frac{3 + κ ρ}{1 - κ ρ} κ τ) | | A z - A x | | .

If $q = 1$ we can repeat the proof with the improved bound of Theorem 6.1.

After these auxiliary statements it remains to prove the main result of Section 3 about the properties of the NNLR minimizer.

Proof of Theorem 3.4. By applying Proposition 6.3 with $x$ and $z : = x^{#} \geq 0$ we get

‖x−x#‖q≤2(1+κρ)21−κρκS1/q−1d1(x,ΣS)+((1+κρ)21−κρS1/q−1maxn∈[N]|(ATt)n−1|||t||*+3+κρ1−κρκτ)||Ax#−Ax||≤2(1+κρ)21−κρκS1/q−1d1(x,ΣS)+((1+κρ)21−κρS1/q−1maxn∈[N]|(ATt)n−1|‖t‖*+3+κρ1−κρκτ)(‖Ax#−y‖+||Ax−y||)≤2(1+κρ)21−κρκS1/q−1d1(x,ΣS)+2((1+κρ)21−κρS1/q−1maxn∈[N]|(ATt)n−1|‖t‖*+3+κρ1−κρκτ)||Ax−y||,

where in the last step we used that $x^{#}$ is a minimizer and $x$ is feasible. If $q = 1$ , we can repeat the proof with the improved bound of Proposition 6.3.

Keywords: compressed sensing, compressible, sparse, non-negative, regression, tuning-free, expander, uniform

Citation: Petersen HB, Bah B and Jung P (2021) Efficient Tuning-Free l₁-Regression of Nonnegative Compressible Signals. Front. Appl. Math. Stat. 7:615573. doi: 10.3389/fams.2021.615573

Received: 09 October 2020; Accepted: 16 April 2021;
Published: 07 june 2021.

Edited by:

Qiyu Sun, University of Central Florida, United States

Reviewed by:

Guohui Song, Old Dominion University, United States
Yunlong Feng, University at Albany, United States

Copyright © 2021 Petersen, Bah and Jung. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Hendrik Bernd Petersen, cGV0ZXJzZW5AdHUtYmVybGluLmRl

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.