Skip to main content

ORIGINAL RESEARCH article

Front. Appl. Math. Stat., 07 June 2021
Sec. Mathematics of Computation and Data Science
This article is part of the Research Topic 2021 Editorā€™s Pick: Applied Mathematics and Statistics View all 13 articles

Efficient Tuning-Free l1-Regression of Nonnegative Compressible Signals

  • 1Communications and Information Theory Group, Technische Universtität Berlin, Berlin, Germany
  • 2African Institute for Mathematical Sciences, Cape Town, South Africa
  • 3Division of Applied Mathematics, Stellenbosch University, Stellenbosch, South Africa

In compressed sensing the goal is to recover a signal from as few as possible noisy, linear measurements with the general assumption that the signal has only a few non-zero entries. The recovery can be performed by multiple different decoders, however most of them rely on some tuning. Given an estimate for the noise level a common convex approach to recover the signal is basis pursuit denoising. If the measurement matrix has the robust null space property with respect to the 2-norm, basis pursuit denoising obeys stable and robust recovery guarantees. In the case of unknown noise levels, nonnegative least squares recovers non-negative signals if the measurement matrix fulfills an additional property (sometimes called the M+-criterion). However, if the measurement matrix is the biadjacency matrix of a random left regular bipartite graph it obeys with a high probability the null space property with respect to the 1-norm with optimal parameters. Therefore, we discuss non-negative least absolute deviation (NNLAD), which is free of tuning parameters. For these measurement matrices, we prove a uniform, stable and robust recovery guarantee. Such guarantees are important, since binary expander matrices are sparse and thus allow for fast sketching and recovery. We will further present a method to solve the NNLAD numerically and show that this is comparable to state of the art methods. Lastly, we explain how the NNLAD can be used for viral detection in the recent COVID-19 crisis.

1 Introduction

Since it has been realized that many signals admit a sparse representation in some frames, the question arose whether or not such signals can be recovered from less samples than the dimension of the domain by utilizing the low dimensional structure of the signal. The question was already answered positively in the beginning of the millennium [1, 2]. By now there are multiple different decoders to recover a sparse signal from noisy measurements with robust recovery guarantees. Most of them however rely on some form of tuning, depending on either the signal or the noise.

The basis pursuit denoising requires an upper bound on the norm of the noise ([3], Theorem 4.22), the least shrinkage and selection operator an estimate on the 1-norm of the signal ([4], Theorem 11.1) and the Lagrangian version of least shrinkage and selection operator allegedly needs to be tuned to the order of the the noise level ([4], Theorem 11.1). The expander iterative hard thresholding needs the sparsity of the signal or an estimate of the order of the expansion property ([3], Theorem 13.15). The order of the expansion property can be calculated from the measurement matrix, however there is no polynomial time method known to do this. Variants of these methods have similar drawbacks. The non-negative basis pursuit denoising requires the same tuning parameter as the basis pursuit denoising [5]. Other thresholding based decoders like sparse matching pursuit and expander matching pursuit have the same limitations as the expander iterative hard thresholding [6].

If these side information is not known a priori, many decoders yield either no recovery guarantees or, in their imperfectly tuned versions, yield sub-optimal estimation errors ([3], Theorem 11.12). Even though the problem of sparse recovery from under-sampled measurements has been answered long ago, finding tuning free decoders that achieve robust recovery guarantees is still a topic of interest.

The most prominent achievement for that is the non-negative least squares (NNLS) [711]. It is completely tuning free [12] and in [13, 14] it was proven that it achieves robust recovery guarantees if the measurement matrix consists of independent biased sub-Gaussian random variables.

1.1 Our Contribution

We will replace the least squares in the NNLS with an arbitrary norm and obtain the non-negative least residual (NNLR). We use the methods of [13] to prove recovery guarantees under similar conditions as the NNLS. In particular, we consider the case where we minimize the 1-norm of the residual (NNLAD) and give a recovery guarantee if the measurement matrix is a random walk matrix of a uniformly at random drawn D-left regular bipartite graph.

In general, our results state that if the M+ criterion is fulfilled, the basis pursuit denoising can be replaced by the tuning-less NNLR for non-negative signals. Note that the M+criterion can be fulfilled by adding only one explicitly chosen measurement if that is possible in the application. Thus, in practice the NNLR does not require more measurements than the BPDN to recover sparse signals. While biased sub-Gaussian measurement matrices rely on a probabilistic argument to verify that such a measurement is present, random walk matrices of left regular graphs naturally have such a measurement. The tuning-less nature gives the NNLR an advantage over other decoders if the noise power can not be estimated, which is for instance the case when we have multiplicative noise or the measurements are Poisson distributed. Note that Laplacian distributed noise or the existence of outliers also favors an 1 regression approach over an 2 regression approach and thus motivate to use the NNLAD over the NNLS.

Further, the sparse structure of left regular graphs can reduce the encoding and decoding time to a fraction. Using [15] we can solve the NNLAD with a first order method of a single optimization problem with a sparse measurement matrix. Other state of the art decoders often use non-convex optimization, computationally complex projections or need to solve multiple different optimization problems. For instance, to solve the basis pursuit denoising given a tuning parameter a common approach is to solve a sequence of 1-constrained least residual1 problems to approximate where the Pareto curve attains the value of the tuning parameter of basis pursuit denoising [16]. Cross-validation techniques suffer from similar issues [17].

1.2 Relations to Other Works

We build on the theory of [13] that uses the 2 null space property and the M+ criterion. These methods have also been used in [12, 14]. To the best of the authors knowledge the M+ criterion has not been used with an 1 null space property before. Other works have used adjacency matrices of graphs as measurements matrices including [6, 1821]. The works [18, 19] did not consider noisy observations. The decoder in [20] is the basis pursuit denoising and thus requires tuning depending on the noise power. [21] proposes two decoders for non-negative signals. The first is the non-negative basis pursuit which could be extended to the non-negative basis pursuit denoising. However, this again needs a tuning parameter depending on the noise power. The second decoder, the Reverse Expansion Recovery algorithm, requires the order of the expansion property, which is not known to be calculatable in a polynomial time. The survey [6] contains multiple decoders including the basis pursuit, which again needs tuning depending on the noise power for robustness, the expander matching pursuit and the sparse matching pursuit, which need the order of the expansion property. Further, [5] considered sparse regression of non-negative signals and also used the non-negative basis pursuit denoising as decoder, which again needs tuning dependent on the noise power. To the best of the authors knowledge, this is the first work that considers tuning-less sparse recovery for random walk matrices of left regular bipartite graphs. The NNLAD has been considered in [22] with a structured sparsity model without the use of the M+ criterion.

2 Preliminaries

For K we denote the set of integers from 1 to K by [K]. For a set T[N] we denote the number of elements in T by #(T). Vectors are denoted by lower case bold face symbols, while its corresponding components are denoted by lower case italic letters. Matrices are denoted by upper case bold face symbols, while its corresponding components are denoted by upper case italic letters. For xN we denote its p-norms by ||x||p. Given AM×N we denote its operator norm as operator from q to p by ||A||qp:=supvN,||v||q1||Av||p. By +N we denote the non-negative orthant. Given a closed convex set CN, we denote the projection onto C, i.e., the unique minimizer of argminzC1/2||zv||22, by PC(v). For a vector xN and a set T[N], x|T denotes the vector in N, whose nth component is xn if nT and 0 else. Given N,S we will often need sets T[N] with #(T)S and we abbreviate this by #(T)S if no confusion is possible.

Given a measurement matrix AM×N a decoder is any map QA:MN. We refer to xN as signal. If x+N={zN:zn0foralln[N]}, we say the signal is non-negative and write shortly x0. If additionally xn>0 for all n[N], we write x>0. An input of a decoder, i.e., any yM, is refered to as observation. We allow all possible inputs of the decoder as observation, since in general the transmitted codeword Ax is disturbed by some noise. Thus, given a signal x and an observation y we call e:=yAx the noise. A signal x is called S-sparse if ||x||p:=#({n[N]:xn0})S. We denote the set of S-sparse vectors by

ΣS:={zN:z0S}.

Given some S[N] the compressibility of a signal x can be measured by d1(x,ΣS):=infzΣS||xz||1.Given N and S, the general non-negative compressed sensing task is to find a measurement matrix AM×N and a decoder QA:MN with M as small as possible such that the following holds true: There exists a q[1,] and a continuous function C:×M+ with C(0,0)=0 such that

||QA(y)x||qC(d1(x,ΣS),yAx)forallx+NandyM

holds true. This will ensure that if we can control the compressibility and the noise, we can also control the estimation error and in particular decode every noiseless observation of S-sparse signals exactly.

3 Main Results

Given a measurement matrix AM×N and a norm on M we define the decoder as follows: Given yM set QA(y) as any minimizer of

argminz0||Azy||.

We call this problem non-negative least residual (NNLR). In particular, for =1 this problem is called non-negative least absolute deviation (NNLAD) and for =2 this problem is known as the non-negative least squares (NNLS) studied in [13]. In fact, we can translate the proof techniques fairly simple. We just need to introduce the dual norm.

Definition 3.1. Let be a norm on M. The norm * on M defined by

v*:=supu1v,u,

is called dual norm to .

Note that the dual norm is actually a norm. To obtain a recovery guarantee for NNLR we have certain requirements on the measurement matrix A. We use a null space property.

Definition 3.2. Let S[N], q[1,) and be any norm on M. Further let AM×N. Suppose there exists constants ρ[0,1) and τ[0,) such that

v|TqρS1/q1v|Tc1+τ||Av||forallvNand#(T)S.

Then, we say A has the q-robust null space property of order S with respect to or in short A has the q-RNSP of order S with respect to with constants ρ and τ. ρ is called stableness constant and τ is called robustness constant.

Note that smaller stableness constants increase the reliability of recovery if many, small, non-zero components are present, while smaller robustness constants increase the reliability if the measurements are noisy. In order to make use of the non-negativity of the signal, we need A to be biased in a certain way. In [13] this bias was guaranteed with the M+ criterion.

Definition 3.3. Let AM×N. Suppose there exists tM such that ATt>0. Then we say A obeys the the M+ criterion with vector t and constant κ:=maxn[N]|(ATt)n|maxn[N]|(ATt)n1|.

Note that κ is actually a condition number of the matrix with diagonal ATt and 0 else. Condition number numbers are frequently used in error bounds of numerical linear algebra. The general recovery guarantee is the following and similar results have been obtained in the matrix case in [23].

Theorem 3.4 (NNLR Recovery Guarantee). Let S[N], q[1,) and be any norm on M with dual norm *. Further, suppose that AM×N obeys

a) the q-RNSP of order S with respect to with constants ρ and τ and

b) the M+ criterion with vector t and constant κ.

If κρ<1, the following recovery guarantee holds true: For all x+N and yM any minimizer x# of

argminz0Azy,

obeys the bound

xx#q2(1+κρ)21κρκS1/q1d1(x,ΣS)+2((1+κρ)21κρS1/q1maxn[N]|(ATt)n1|t*+3+κρ1κρκτ)Axy.

If q=1, this bound can be improved to

||xx#||121+κρ1κρκd1(x,ΣS)+2(1+κρ1κρmaxn[N]|(ATt)n1|t*+21κρκτ)Axy.

Proof The proof can be found in Subsection 6.1.

Given a matrix with q-RNSP we can add a row of ones (or a row consisting of one minus the column sums of the matrix) to fulfill the M+ criterion with the optimal κ=1. Certain random measurement matrices guarantee uniform bounds on κ for fixed vectors t. In ([13], Theorem 12) it was proven that if Am,n are all i.i.d. 0/1 Bernoulli random variables, A has M+ criterion with t=(1,,1)TM and κ3 with high probability. This is problematic, since if κ>1, it might happen that κρ<1 is not fulfilled anymore. Since the stableness constant ρ(S) as a function of S is monotonically increasing, the condition κρ(S)<1 might only hold if S<S. If that is the case, there are vectors xΣS that are being recovered by basis pursuit denoising but not by NNLS! This is for instance the case for the matrix A=(101011), which has 1-robust null space property of order 1 with stableness constant ρ:=1/2 and M+ criterion with κ2 for any possible choice of t. In particular, the vector x=(0,0,1)T is not necessarily being recovered by the NNLAD and the NNLS.

Hence, it is crucial that the vector t is chosen to minimize κ and ideally obeys the optimal κ=1. This motivates us to use random walk matrices of regular graphs since they obey exactly this.

Definition 3.5. Let A{0,1}M×N and D[M]. For TN the set

Row(T):=nT{m[M]suchthatAm,n=1}

is called the set of right vertices connected to the set of left vertices T. If

#(Row({n}))=Dforalln[N],

then D1A{0,D1}M×N is called a random walk matrix of a D-left regular bipartite graph. We also say short that D1A is a D-LRBG. If additionally there exists a θ[0,1) such that

#(Row(T))(1θ)D#(T)forall#(T)S,

holds true, then D1A is called a random walk matrix of a (S,D,θ)-lossless expander.

Note that we have made a slight abuse of notation. The term D-LRBG as a short form for D-left regular bipartite graph refers in our case to the random walk matrix A but not the graph itself. We omit this minor technical differentiation, for the sake of shortening the frequently used term random walk matrix of a D-left regular bipartite graph. Lossless expanders are bipartite graphs that have a low number of edges but are still highly connected, see for instance ([24], Chapter 4). As a consequence their random walk matrices have good properties for compressed sensing. It is well known that random walk matrices of a (2S,D,θ)-lossless expanders obey the 1-RNSP of order S with respect to , see ([3], Theorem 13.11). The dual norm of 1 is the norm and the M+ criterion is easily fulfilled, since the columns sum up to one. From Theorem 3.4 we can thus draw the following corollary.

Corollary 3.6 (Lossless Expander Recovery Guarantee). Let S[N], θ[0,1/6). If A{0,D1}M×N is a random walk matrix of a (2S,D,θ)-lossless expander, then the following recovery guarantee holds true: For all x+N and yM any minimizer x# of

argminz0||Azy||1

obeys the bound

xx#1212θ16θd1(x,ΣS)+232θ16θ||Axy||1(1)

Proof By ([3], Theorem 13.11) A has 1-RNSP with respect to 1 with constants ρ=2θ/(14θ) and τ=1/(14θ). The dual norm of the norm 1 is . If we set t:=(1,,1)TM, we get

(ATt)n=m[M]Am,n=DD1=1foralln[N].

Hence, A has the M+ criterion with vector t and constant κ=1 and the condition κρ<1 is immediately fulfilled. We obtain ||t||*=||t||=1 and maxn[N]|(ATt)n1|=1. Applying Theorem 3.4 with improved bound for q=1 and these values yields

xx#121+ρ1ρd1(x,ΣS)+2(1+ρ1ρ+21ρτ)||Axy||1.

If we additionally substitute the values for ρ and τ we get

||xx#||1212θ16θd1(x,ΣS)+2(12θ16θ+2116θ)||Axy||1212θ16θd1(x,ΣS)+232θ16θ||Axy||1.

This finishes the proof.

Note that ([3], Theorem 13.11) is an adaption of ([20], Lemma 11) to account for robustness and skips proving the 1 restricted isometry property. If M2/θexp(2/θ)SLn(eN/S) and D=2/θLn(eN/S), a uniformly at random drawn D-LRBG is a random walk matrix of a (2S,D,θ)-lossless expander with a high probability[ [3], Theorem 13.7]. Thus, recovery with the NNLAD is possible in the optimal regime MO(SlogN/S).

3.2 On the Robustness Bound for Lossless Expanders

If A is a random walk matrix of a (2S,D,θ)-lossless expander with θ[0,1/6), then we can also draw a recovery guarantee for the NNLS. By ([3], Theorem 13.11) A has 1-RNSP with respect to 1 with constants ρ=2θ/(14θ) and τ=1/(14θ) and hence also 1-RNSP with respect to 2 with constants ρ=ρ and τ=τM1/2. Similar to the proof of Corollary 3.6 we can use Theorem 3.4 to deduce that any minimizer x# of

argminz0||Azy||2,

obeys the bound

||xx#||1212θ16θd1(x,ΣS)+232θ16θM1/2||Axy||2(2)

If the measurement error e=yAx is a constant vector, i.e., e=α1, then e1=M1/2e2. In this case the error bound of the NNLS is just as good as the error bound of the NNLAD. However, if e is a standard unit vector, then e1=e2. In this case the error bound of the NNLS is significantly worse than the error bound of the NNLAD. Thus, the NNLAD performs better under peaky noise, while the NNLS and NNLAD are tied under noise with evenly distributed mass. We will verify this numerically in Subsection 5.1. One can draw a complementary result for matrices with biased sub-Gaussian entries, which obey the 2-RNSP with respect to 2 and the M+ criterion in the optimal regime [13]. Table 1 states the methods, which have an advantage over the other in each scenario.

TABLE 1
www.frontiersin.org

TABLE 1. Table of advantages of NNLAD and NNLS over each other.

4 Non-Negative Least Absolute Deviation Using a Proximal Point Method

In this section we assume that =p with some p[1,]. If p{1,}, the NNLR can be recast as a linear program by introducing some slack variables. For an arbitrary p the NNLR is a convex optimization problem and the objective function has a simple and globally bounded subdifferential. Thus, the NNLR can directly be solved with a projective subgradient method using a problem independent step size. Such subgradient methods achieve only a convergence rate of O(log(k)k1/2) toward the optimal objective value ([25], Section 3.2.3), where k is the number of iterations performed. In the case that the norm is the 2-norm, we can transfer the problem into a differentiable version, i.e. the NNLS

argminz012||Azy||22.

Since the gradient of such an objective is Lipschitz, this problem can be solved by a projected gradient method with constant step size, which achieves a convergence rate of O(k2) toward the optimal objective value [26, 27]. However this does not generalize to the 1-norm. The proximal point method proposed in [15] can solve the case of the 1-norm with a convergence rate O(k1) toward the optimal objective value. Please refer to Algorithm 1.

ALGORITHM 1
www.frontiersin.org

ALGORITHM 1. NNLAD as First Order Method

Algorithm 1 is a primal-dual algorithm. Within the loop, lines 7, 8, 1 and 2 calculate the proximal point operator of the Fenchel conjugate of Ay1 to update the dual problem, lines 3 and 5 update the primal problem, and lines 4 and 6 perform a momentum step to accelerate convergence. Further, line 8 sets x˜ to Ax and avoids a third matrix vector multiplication. Note that σ1 and σ2 can be replaced by any values that satisfy σ1σ2<||A||222. The calculation of σ1 and σ2 might be a bottle neck for the computational complexity of the algorithm. If one wants to solve multiple problems with the same matrix, σ1 and σ2 should only be calculated once and not in each run of the algorithm. For any σ1σ2<A222 the following convergence guarantee can be deduced from ([15], Theorem 1). Let xk and wk be the values of x and w at the end of the kth iteration of the while loop of Algorithm 1. Then, the following statements hold true:

(1) The iterates converge: The sequence (xk)k converges to a minimizer of argminz0||Azy||1.

(2) The iterates are feasible: We have xk0 and ||wk||1 for all k1.

(3) There is a stopping criteria for the iterates: limk||Axky||1+y,wk=0 and limkATwk0. In particular, if ||Axky||1+y,wk0 and ATwk0, then xk is a minimizer of argminz0||Azy||1.

(5) The averages obey the convergence rate toward the optimal objective value: A1/kk'=1kxky1Ax#y11/k(1/(2σ2)||x#x0||22+1/(2σ1)(w022+2w01+M)), where x# is a minimizer of argminz0||Azy||1.

The formal version and proof can be found in [28]. Note that this yields a convergence guarantee for both the iterates and averages, but the convergence rate is only guaranteed for the averages. Algorithm 1 is optimized in the sense that it uses the least possible number of matrix vector multiplications per iteration, since these govern the computational complexity.

Remark 4.1. Let A be D-LRBG. Each iteration of Algorithm 1 requires at most 4DN+8N+16M floating point operations and 5N+4M assignments.

4.1 Iterates or Averages

The question arises whether or not it is better to estimate with averages or iterates. Numerical testing suggest that the iterates reach tolerance thresholds significantly faster than the averages. We can only give a heuristically explanation for this phenomenon. The stopping criteria of the iterates yields limkATwk0. In practice we observe that ATwk0 for all sufficiently large k. However, ATwk+10 yields xk+1xk. This monotonicity promotes the converges of the iterates and gives a clue why the iterates seem to converge better in practice. See Figures 5, 6.

4.2 On the Convergence Rate

As stated the NNLS achieves the convergence rate O(k2) [27] while the NNLAD only achieves the convergence rate of O(k1) toward to optimal objective value. However, this should not be considered as weaker, since the objective function of the NNLS is the square of a norm. If xk are the iterates of the NNLS implementation of [27], algebraic manipulation yields

Axky2Ax#y221/2(12||Axky||2212||Ax#y||22)1/221/2(Ck2)1/2(2C)1/2k1.

Thus, the 2-norm of the residual of the NNLS iterates only decays in the same order as the 1-norm of the residual of the NNLAD averages.

5 Numerical Experiments and Applications

In the first part of this section we will compare NNLAD with several state of the art recovery methods in terms of achieved sparsity levels and decoding time. For p[1,], we denote SpN1:={zN:zp=1}, and S0N1:={zN:z0=1=||z||2}=Σ1S2N1.

5.1 Properties of the Non-Negative Least Absolute Deviation Optimizer

We recall that the goal is to recover x from the noisy linear measurements y=Ax+e. To investigate properties of the minimizers of NNLAD we compare it to the minimizers of the well studied problems basis pursuit (BP), optimally tuned basis pursuit denoising (BPDN), optimally tuned 1-constrained least residual (CLR) and the NNLS, which are given by

argminz:Azy1ϵz1withε=e1(BPDN),argminz:z1τAzy1withτ=x1(CLR),argminz:Az=yz1(BP),argminz0||Azy||2(NNLS).

The NNLAD is designed to recover non-negative signals in general and, as we will see, it is able to recover sparse non-negative signals comparable well to the CLR and BPDN with optimal tuning which are particularly designed for this task. Further, we compare the NNLAD to the expander iterative hard thresholding (EIHT). The EIHT is calculated by stopping the following sequence after a suitable stopping criteria is met:

x0:=0andxk+1:=PΣS(xk+median(yAxk))forallk0andwithS=x0(EIHT),

where median(z)n is the median of (zm)mRow({n}) and PΣS(v) is a hard thresholding operator, i.e., some minimizer of argminzΣS1/2zv22. There is a whole class of thresholding based decoders for lossless expanders, which all need either the sparsity of the signal or the order of the expansion property as tuning parameter. We choose the EIHT as a represent of this class, since the cluster points of its sequence have robust recovery guarantees ([3], Theorem 13.5). By convex decoders we refer to BPDN, BP, CLR, NNLAD, and NNLS. We choose the optimal tuning ε=e1 for the BPDN and τ=||x||1 for the CLR. The optimally tuned BPDN and CLR are representing a best case benchmark. In ([29], Figure 1.1) it was noticed that tuning the BPDN with ε>||e||p often leads to worse estimation errors than tuning with ε<||e||p for p=2. Thus, BP is a version of BPDN with no prior knowledge about the noise and represents a worst case benchmark. At fist we investigate the properties of the estimators. In order to mitigate effects from different implementations we solve all optimization problems with the CVX package of Matlab [30, 31]. For a given 1SNR,r,N,M,D,S we will do the following experiment multiple times:

Experiment 1

1. Generate a measurement matrix A{0,D1}M×N uniformly at random among all D-LRBG.

2. Generate a signal x uniformly at random from ΣS+NS1N1.

3. Generate a noise e uniformly at random from Ax1/1SNRSrM1.

4. Define the observation y:=Ax+e.

5. For each decoder QA calculate an estimator x#:=QAy and collect the relative estimation error ||xx#||1=xx#1/x1.

In this experiment we have 1SNR=Ax1/e1 and since A is a D-LRBG and x0, we further have Ax1=||x||1=1. Note that for r=0 and r=1 we obtain two different noise distributions. If e is uniformly distributed on S1M1, then the absolute value of each component |em| is a random variable with density h(M1)(1h)M2 for h[0,1]. Thus, E[||e||22]=M2/M(M+1)=2/(M+1). By testing one can observe a concentration around this expected value, in particular that M1/2e22||e||1 with a high probability. If e is uniformly distributed on S0M1, then e2=e1. Thus, these two noise distributions each represent randomly drawn noise vectors obeying one norm equivalence asymptotically tightly up to a constant. From Eqs 1,2 we expect that the NNLS has roughly the same estimation errors as the NNLAD for r=1, i.e. the evenly distributed noise, and significantly worse estimation errors for r=0, i.e., the peaky noise.

5.1.1 Quality of the Estimation Error for Varying Sparsity

We fix the constants r=1, N=1024, M=256, D=10, 1SNR=1000 and vary the sparsity level S[128]. For each S we repeat Experiment 1 100 times. We plot the mean of the relative 1-estimation error and the mean of the logarithmic relative 1-estimation error, i.e.,

Mean(N1E)=Mean(xx#1x1),Mean(LN1E)=Mean(10log10(xx#1x1)),

over the sparsity. The result can be found in Figures 1A,B.

FIGURE 1
www.frontiersin.org

FIGURE 1. Performance of NNLAD for noise with even mass noise and varying sparsity of the signal. (A) NNLAD has almost the same performance as CLR/BPDN. EIHT fails for moderate S. (B) NNLAD and NNLS perform roughly the same.

For S30 the estimation error of the EIHT randomly peaks high. We deduce that the EIHT fails to recover the signal reliably for S30, while the NNLAD and other convex decoders succeed. This is not surprising, since by ([3], Theorem 13.15) the EIHT obeys a robust recovery guartanee for S-sparse signals, whenever A is the random wak matrix of a (3S,D,θ)-lossless expander with θ<1/12. This is significantly stronger than the (2S,D,θ)-lossless expander property with θ<1/6 required for a null space property. It might also be that the null space property is more likely than the lossless expansion property similar to the gap between 2-restricted isometry property and null space property [32]. However, if the EIHT recovers a signal, it recovers it significantly better than any convex method. This might be the case, since the originally generated signal is indeed from ΣS, which is being enforced by the hard thresholding of the EIHT, but not by the convex decoders. This suggests that it might be useful to consider using thresholding on the output of any convex decoder to increase the accuracy if the orignal signal is indeed sparse and not only compressible. For the remainder of this subsection we focus on convex decoders.

Contrary to our expectation the BPDN achieves worse estimation errors than all other convex decoders for S60, even worse than the BP. The authors have no explanation for this phenomenon. Apart from that we observe that the CLR and BP indeed perform as respectively best and worst case benchmark. However, the difference between BP and CLR becomes rather small for high S. We deduce that tuning becomes less important near the optimal sampling rate.

The NNLAD, NNLS and CLR achieve roughly the same estimation errors. However, note that the BPDN and CLR are optimally tuned using unknown prior information unlike the NNLAD and NNLS. As expected the NNLS performs roughly the same as the NNLAD, see Table 1. However, this is the result of the noise distribution for r=1. We repeat Experiment 1 with the same constants, but r=0, i.e., e is a unit vector scaled by ±||Ax||1/1SNR. We plot the mean of the relative 1-estimation error and the mean of the logarithmic relative 1-estimation error over the sparsity. The result can be found in Figures 2A,B.

FIGURE 2
www.frontiersin.org

FIGURE 2. Performance of NNLAD for noise with peaky mass and varying sparsity of the signal. (A) The NNLS does not fail, but performs bad. (B) The NNLS and NNLAD differ strongly.

We want to note that similarly to Figure 1A the EIHT works only unreliably for S30. Even though the mean of the logarithmic relative 1-estimation error of NNLS is worse than the one of EIHT for 30S60, the NNLS does not fail but only approximates with a weak error bound. As the theory suggests, the NNLS performs significantly worse than the NNLAD, see Table 1. It is worth to mention, that the estimaton errors of NNLS seem to be bounded by the estimation errors of BP. This suggests that A obeys a 1 quotient property, that bounds the estimation error of any instance optimal decoder, see ([3], Lemma 11.15).

5.1.2 Noise-Blindness

Theorem 3.4 states that the NNLAD has an error bound similarly to the optimally tuned CLR and BPDN. Further, by Eq. 1 the ratio

||xx#||1||e||1||x||1=||xx#||1||e||1

should be bounded by some constant. To verify this, we fix the constants r=1, N=1024, M=256, D=10, S=32 and vary the signal to noise ratio 1SNR10[100]. For each 1SNR we repeat Experiment 1 100 times. We plot the mean of the logarithmic relative 1-estimation error and the mean of the ratio of relative 1-estimation error and 1-noise power, i.e.

Mean(LN1E)=Mean(10log10(xx#1||x||1)),Mean(N1E1NP)=Mean(xx#1||e||1||x||1),

over the sparsity. The result can be found in Figures 3A,B .

FIGURE 3
www.frontiersin.org

FIGURE 3. Performance of NNLAD for noise with even mass and varying noise power. (A) The NNLAD and NNLS recover reliably for all signal to noise ratios. (B) The estimation error scales linearly with the noise power.

The logarithmic relative 1-estimation errors of the different decoders stay in a constant relation to each other over the whole range of 1SNR. This relation is roughly the relation we can find in Figure 1B for S=32. As expected the the ratio of relative 1-estimation error and 1-noise power stays constant independent on the 1SNR for all decoders. We deduce that the NNLAD is noise-blind. We repeat the experiment with r=0 and obtain Figures 4A,B.

FIGURE 4
www.frontiersin.org

FIGURE 4. Performance of NNLAD for noise with peaky mass and varying noise power. (A) The NNLAD outperforms the NNLS. (B) The estimation error does not scale linearly with the noise power.

Note that xx#1/x1 and not xx#1/(x1e1) seems to be constant. Since xx#1/x11.0⋅107 is fairly small, we suspect that this is the result of CVX reaching a tolerance parameter2eps1.5108 and terminating, while the actual optimizer might in fact be the original signal. It is remarkable that even with the incredibly small signal to noise ratio of 10 the signal can be recovered by the NNLAD with an estimation error of 1.0107 for this noise distribution.

5.2 Decoding Complexity

5.2.1 Non-Negative Least Absolute Deviation Vs Iterative Methods

To investigate the convergence rates of the NNLAD as proposed in 4, we compare it to different types of decoders when e=0. There are some sublinear time recovery methods for lossless expander matrices including ([3 Section 13.4, 5]). These are, as the name suggests, significantly faster than the NNLAD. These, as several other greedy methods ([3 Section 13.3, 5, 18, 19, 21]), rely on a strong lossless expansion property. As a representative of all greedy and sublinear time methods we will consider the EIHT, which has a linear convergence rate O(ck) toward the signal and robust recovery guarantees ([3], Theorem 13.15). The EIHT also represents a best case benchmark. As a direct competitor we consider the NNLS implemented by the methods of [27] 3, which has a convergence rate of O(k2) toward the optimal objective value. [27] can also be used to calculate the CLR if the residual norm is the 2-norm. However, calculating the projection onto the 1-ball in N, is computationally slightly more complex than the projection onto +N. Thus, the CLR will be solved slightly slower than the NNLS with [27]. Note that cross-validation techniques would need to solve multiple optimization problems of a similar complexity as the NNLS to estimate a signal. As a consequence such methods have a multiple times higher complexity than the NNLS and are not considered here. As a worst case benchmark we consider a simple projected subgradient implementation of NNLAD using the Polyak step size, i.e.

xk+1:=P+N(xk||Axky||1||ATsgn(Axky)||22ATsgn(Axky)),(NNLADSubgrad)

which has a convergence rate of O(k1/2) toward the optimal objective value ([33], Section 7.2.2 & Section 5.3.2). We initialized all iterated methods by 0. The EIHT will always use the parameter S=||x||0, the NNLAD σ1=σ2=0.99A221 and the NNLS the parameters s=0.99A222 and α=3.01, see [27]. Just like the BPDN and CLR, the EIHT needs an oracle to get some unknown prior information, in this case x0. Parameters that can be computed from A, will be calculated before the timers start. This includes the adjacency structure of A for the EIHT, σ1, σ2 for NNLAD, s, α for NNLS, since these are considered to be a part of the decoder. We will do the following experiment multiple times:

Experiment 2

1. If r=1, generate a measurement matrix A{0,D1}M×N uniformly at random among all D-LRBG. If r=2, draw each component Am,n of the measurement matrix independent and uniformly at random from {0,1}, i.e., as 0/1 Bernoulli random variables.

2. Generate a signal x uniformly at random from ΣS+NSrN1.

3. Define the observation y:=Ax.

4. For each iterative method calculate the sequence of estimators xk for all k20000 and collect the relative estimation errors ||xkx||1/x1, the relative norms of the residuals Axky1/||y||1 and the time to calculate the first k iterations.

For r=2 this represents a biased sub-Gaussian random ensemble [13] with optimal recovery guarantees for the NNLS.For r=1 this represents a D-LRBG random ensemble with optimal recovery guarantees for the NNLAD. We fix the constants r=1, N=1024, M=256, S=16, D=10 and repeat 2 100 times. We plot the mean of the logarithmic relative 1-estimation error and the mean of the relative 1-norm of the residual, i.e.

Mean(LN1E)=Mean(10log10(||xkx||1||x||1)),Mean(LN1R)=Mean(10log10(Axky1||y||1)),(7)

over the sparsity and the time. The result can be found in Figures 5, 6.

FIGURE 5
www.frontiersin.org

FIGURE 5. Convergence rates of certain iterated methods with respect to the number of iterations.

FIGURE 6
www.frontiersin.org

FIGURE 6. Convergence rates of certain iterated methods with respect to the time.

The averages of NNLAD converge significantly slower than the iterates, even though we lack a convergence rate for the iterates. We deduce that one should always use the iterates of NNLAD to recover a signal. Surprisingly, the averages converge even slower than the subgradient method. However, this is not because the averages converge slow, but rather because the subgradient method and all others converges faster than expected. In particular, the NNLAD iterates, EIHT and the NNLS all converge linearly toward the signal. Further, their correspoding objective values also converge linearly toward the optimal objective value. Even the subgradient method converges almost linearly. We deduce that the NNLS is the fastest of these methods if A is a D-LRBG.

Apart from a constant the NNLAD iterates, EIHT and NNLS converge in the same order. However, this behavior does not hold if we consider a different distribution for A as one can verify by setting each component Am,n as independent 0/1 Bernoulli random variables. While EIHT has better iterations compared to the NNLS, it still takes more time to achieve the same estimation errors and residuals. We plot the mean of the time required to calculate the first k iterations in Figure 7.

FIGURE 7
www.frontiersin.org

FIGURE 7. Time required to perform iterations of certain iterated methods.

The EIHT requires roughly 6 times as long as any other method to calculate each iteration. All methods but the EIHT can be implemented with only two matrix vector multiplications, namely once by A and once by AT. Both of these requires roughly 2DN floating point operations. Hence, each iteration requires O(4DN) floating point operations. The EIHT only calculates one matrix vector multiplication, but also the median. This calculation is significantly slower than a matrix vector multiplication. For every n[N] we need to order a vector with D elements, which can be performed in O(DlogD). Hence, each iteration of EIHT requires O(DNlogD) floating point operations, which explains why the EIHT requires significantly more time for each iteration.

As we have seen the NNLS is able to recover signals faster than any other method, however it also only obeys sub-optimal robustness guarantees for uniformly at random chosen D-LRBG as we have seen in Figure 4A. We ask ourself whether or not the NNLS is also faster with a more natural measurement scheme, i.e., if Am,n are independent 0/1 Bernoulli random variables. We repeat 2 100 times with r=2 for the NNLS and r=1 for the other methods. We again plot the mean of the logarithmic relative 1-estimation error and the mean of the relative 1-norm of the residual in Figures 8, 9.

FIGURE 8
www.frontiersin.org

FIGURE 8. Convergence rates of certain iterated methods with respect to the number of iterations. A is Bernoulli for NNLS and D-LRBG for the others.

FIGURE 9
www.frontiersin.org

FIGURE 9. Convergence rates of certain iterated methods with respect to the time. A is Bernoulli for NNLS and D-LRBG for the others.

The NNLAD and the EIHT converge to the solution with roughly the same time. Even the subgradient implementation of the NNLAD recovers a signal in less time than the NNLS. Further the convergence of NNLS does not seem to be linear anymore. We deduce that sparse structure of A has a more significant influence on the decoding time than the smoothness of the data fidelity term. Also we deduce that even the subgradient method is a viable choice to recover a signal.

5.2.2 Non-Negative Least Absolute Deviation Vs SPGL1

As a last test we compare the NNLAD to the SPGL1 [16, 34] toolbox for matlab.

Experiment 3

1. Draw the measurement matrix A{0,D1}M×N uniformly at random among all D-LRBG.

2. Generate the signal x uniformly at random from ΣS+NSrN1.

3. Define the observation y:=Ax.

4. Use a benchmark decoder to calculate an estimator x# and collect the relative estimation errors ||x#x||1/x1,||x#x||2/x2 and the time to calculate x#.

5. For each iterative method calculate iterations until xkx1/x1x#x1/||x||1 and xkx2/x2x#x2/||x||2. Collect the time to perform these iterations. If this threshold can not be reached after 105 iterations, the recovery failed and the time is set to .

We again fix the dimension N=1024, M=256, D=10 and vary S[128]. For both the BP implementation of SPGL1 and the CLR implementation of SPGL1 we repeat Experiment 3 100 times for each S. We plot the mean of the time to calculate the estimators and plot these over the sparsity in Figures 10A,B.

FIGURE 10
www.frontiersin.org

FIGURE 10. Time of the NNLAD and NNLS to approximate better than SPGL methods. (A) The NNLAD is faster than the BP of SPGL1 for high S. (B). The NNLAD is faster than the CLR of SPGL1 for moderate S.

The NNLAD implementation is slower than both SPGL1 methods for small S. However, if we have the optimal number of measurements MO(SlogN/S), the NNLAD is faster than both SPGL1 methods.

5.2.3 Summary

The implementation of NNLAD as presented in Algorithm 1 is a reliable recovery method for sparse non-negative signals. There are methods that might be faster, but these either recover a smaller number of coefficients (EIHT, greedy methods) or they obey sub-optimal recovery guarantees (NNLS). The implementation is as fast as the commonly uses SPGL1 toolbox, but has the advantage that it requires no tuning depending on the unknown x or e. Lastly, the NNLAD can handle peaky noise overwhelmingly good.

5.3 Application for Viral Detection

With the outbreak and rapid spread of the COVID-19 virus we need to test a large amount of people for an infection. Since we can only test a fixed number of persons in a given time, the number of persons tested for the virus grows at most linearly. On the other hand, models suggest that the number of possibly infected persons grows exponentially. At some point, if that is not already the case, we will have a shortage of test kits and we will not be able to test every person. It is thus desirable to test as much persons with as few as possible test kits.

The field group testing develops strategies to test groups of individuals instead of individuals in order to reduce the amount of tests required to identify infected individuals. The first advances in group testing were made in [35]. For a general overview about group testing we refer to [36].

The problem of testing a large group for a virus can be modeled as a compressed sensing problem in the following way: Suppose we want to test N persons, labeled by [N]={1,,N}, to check whether or not they are affected by a virus. We denote by xn the quantity of viruses in the specimen of the nth person. Suppose we have M test kits, labeled by [M]={1,,M}. By ym we denote the amount of viruses in the sample of the mth test kit. Let A[0,1]M×N. For every n we put a fraction of size Am,n of the specimen of the nth person into the sample for the mth test kit. The sample of the mth test kit will then have the quantity of viruses

n[N]Am,nxn+emcon,

where emcon is the amount of viruses in the sample originating from a possible contamination of the sample. A quantitative reverse transcription polymerase chain reaction estimates the quantity of viruses by ym with a small error empcr=ymn[N]Am,nxnemcon. After all M tests we detect the quantity

y=Ax+e,(8)

where e=econ+epcr. Since contamination of samples happens rarely, econ is assumed to be peaky in terms of Table 1, while epcr is assumed to have even mass but a small norm. In total e is peaky.

Often each specimen is tested separately, meaning that A is the identity. In particular, we need at least as much test kits as specimens. Further, we estimate the true quantity of viruses xn by xn#:=yn, which results in the estimation error xn#xn=en=encon+enpcr. Since the noise vector e is peaky, some but few tests will be inaccurate and might result in false positives or false negatives.

In general, only a fraction of persons is indeed affected by the virus. Thus, we assume that x0S for some small S. Since the amount of viruses is a non-negative value, we also have x0. Hence, we can use the NNLR to estimate x and in particular we should use the NNLAD due to the noise being peaky. Corollary 3.6 suggests to choose A as the random walk matrix of a lossless expander or by ([3], Theorem 13.7) to choose A as a uniformly at random chosen D-LRBG. Such a matrix A has non-negative entries and the column sums of A are not greater than one. This is a necessary requirement since each column sum is the total amount of specimen used in the test procedure. Especially, a fraction of D1 of each specimen is used in exactly D test kits.

By Corollary 3.6 and [ [3], Theorem 13.7] this allows us to reduce the number of test kits required to MCSlogeN/S. As we have seen in Figures 4A,B we expect the NNLAD estimator to correct the errors from econ and the estimation error to be in the order of ||epcr||1 which is assumed to be small. Hence, the NNLAD estimator with a random walk matrix of a lossless expander might even result in less false positives and false negatives than individual testing.

Note that the lack of knowledge about the noise e favors the NNLAD recovery method over a (BPDN) approach. Further, since the total sum of viruses in all patients given by n[N]xn=x1 is unknown, it is undesirable to use (CLR).

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author Contributions

BB and PJ proposed the research problem. HP derived the results of the paper, with feedback from BB and PJ. HP wrote the paper also with feedback from BB and PJ.

Funding

The work was partially supported by DAAD grant 57417688. PJ has been supported by DFG grant JU 2795/3. BB has been supported by BMBF through the German Research Chair at AIMS, administered by the Humboldt Foundation. We acknowledge support by the German Research Foundation and the Open Access Publication Funds of TU Berlin. This article has appeared as a preprint (27), see https://arxiv.org/abs/2003.13092.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Footnotes

1The 1-constrained least residual is given by argminz1τ||Azy|| for some norm .

2The tolerance parameters of CVX are the second and fourth root of the machine precision by default [30, 31].

3This was the fastest method found by the authors. Other possibilities would be [15, Algorithm 2], [26].

References

1. Candes, EJ, Romberg, J, and Tao, T. Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information. IEEE Trans Inf Theory (2006). 52:489–509. doi:10.1109/TIT.2005.862083

CrossRef Full Text | Google Scholar

2. Donoho, DL. Compressed Sensing. IEEE Trans Inform Theor (2006). 52:1289–306. doi:10.1109/TIT.2006.871582

CrossRef Full Text | Google Scholar

3. Foucart, S, and Rauhut, H. A Mathematical Introduction to Compressive Sensing. Basel, Switzerland: Birkhäuser (2013).

4. Hastie, T, Tibshirani, R, and Wainwright, M. Statistical Learning with Sparsity: The Lasso and Generalizations. New York, NY: Chapman and Hall/CRC (2015).

5. Donoho, DL, and Tanner, J. Sparse Nonnegative Solution of Underdetermined Linear Equations by Linear Programming. Proc Natl Acad Sci (2005). 102:9446–51. doi:10.1073/pnas.0502269102

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Gilbert, A, and Indyk, P. Sparse Recovery Using Sparse Matrices. Proc IEEE (2010). 98:937–47. doi:10.1109/JPROC.2010.2045092

CrossRef Full Text | Google Scholar

7. Bruckstein, AM, Elad, M, and Zibulevsky, M. On the Uniqueness of Non-negative Sparse & Redundant Representations. Proc IEEE Int Conf Acoust Speech Signal Process (2008). 5145–8. doi:10.1109/ICASSP.2008.4518817

CrossRef Full Text | Google Scholar

8. Donoho, DL, and Tanner, J. Counting the Faces of Randomly-Projected Hypercubes and Orthants, with Applications. Discrete Comput Geom (2010). 43:522–41. doi:10.1007/s00454-009-9221-z

CrossRef Full Text | Google Scholar

9. Wang, M, Xu, W, and Tang, A. A Unique “Nonnegative” Solution to an Underdetermined System: From Vectors to Matrices. IEEE Trans Signal Process (2011). 59:1007–1016. doi:10.1109/TSP.2010.2089624

CrossRef Full Text | Google Scholar

10. Slawski, M, and Hein, M. Sparse Recovery by Thresholded Non-negative Least Squares. Adv Neural Inf Process Syst (2011). 24:1926–1934.

Google Scholar

11. Slawski, M, and Hein, M. Non-negative Least Squares for High-Dimensional Linear Models: Consistency and Sparse Recovery without Regularization. Electron J Stat (2013). 7:3004–3056. doi:10.1214/13-EJS868

CrossRef Full Text | Google Scholar

12. Kabanava, M, Kueng, R, Rauhut, H, and Terstiege, U. Stable Low-Rank Matrix Recovery via Null Space Properties. Inf Inference (2016). 5:405–441. doi:10.1093/imaiai/iaw014

CrossRef Full Text | Google Scholar

13. Kueng, R, and Jung, P. Robust Nonnegative Sparse Recovery and the Nullspace Property of 0/1 Measurements. IEEE Trans Inf Theory (2018). 64:689–703. doi:10.1109/TIT.2017.2746620

CrossRef Full Text | Google Scholar

14. Shadmi, Y, Jung, P, and Caire, G. Sparse Non-negative Recovery from Biased Subgaussian Measurements Using NNLS. In: IEEE International Symposium on Information Theory; 2019 July 7–12; Paris, France (2019).

CrossRef Full Text | Google Scholar

15. Chambolle, A, and Pock, T. A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging. J Math Imaging Vis (2011). 40:120–45. doi:10.1007/s10851-010-0251-1

CrossRef Full Text | Google Scholar

16. Van Den Berg, E, and Friedlander, MP. Probing the Pareto Frontier for Basis Pursuit Solutions. SIAM J Sci Comput (2009). 31:890–912. doi:10.1137/080714488

CrossRef Full Text | Google Scholar

17. Bühlmann, P, and Van De Geer, S. Statistics for High-Dimensional Data. Berlin, Heidelberg: Springer (2011).

18. Jafarpour, S, Xu, W, Hassibi, B, and Calderbank, R. Efficient and Robust Compressed Sensing Using Optimized Expander Graphs. IEEE Trans Inf Theory (2009). 55:4299–4308. doi:10.1109/TIT.2009.2025528

CrossRef Full Text | Google Scholar

19. Xu, W, and Hassibi, B. Efficient Compressive Sensing with Deterministic Guarantees Using Expander Graphs. In: IEEE Information Theory Workshop; 2007 September 2–6; Tahoe City, CA, United States (2007). 414–9.

Google Scholar

20. Berinde, R, Gilbert, AC, Indyk, P, Karloff, H, and Strauss, MJ. Combining Geometry and Combinatorics: A Unified Approach to Sparse Signal Recovery. In: 46th Annual Allerton Conference on Communication; 2008 September 23–26; Monticello, IL, United States (2008). 798–805.

Google Scholar

21. Khajehnejad, MA, Dimakis, AG, Xu, W, and Hassibi, B. Sparse Recovery of Nonnegative Signals with Minimal Expansion. IEEE Trans Signal Process (2011). 59:196–208. doi:10.1109/TSP.2010.2082536

CrossRef Full Text | Google Scholar

22. Morgenshtern, VI, and Candès, EJ. Super-resolution of Positive Sources: The Discrete Setup. SIAM J Imaging Sci (2016). 9:412–44. doi:10.1137/15M1016552

CrossRef Full Text | Google Scholar

23. Jaensch, F, and Jung, P. Robust Recovery of Sparse Nonnegative Weights from Mixtures of Positive-Semidefinite Matrices(Preprint). (2020). Available from: https://arxiv.org/abs/2003.12005.

24. Vadhan, SP. Pseudorandomness. FNT Theor Comp Sci (2012). 7:1–336. doi:10.1561/0400000010

CrossRef Full Text | Google Scholar

25. Nesterov, Y Introductory Lectures on Convex Optimization - A Basic CourseApplied Optimization. New York, NY: Springer (2004).

CrossRef Full Text

26. Beck, A, and Teboulle, M. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM J Imaging Sci (2009). 2:183–202. doi:10.1137/080716542

CrossRef Full Text | Google Scholar

27. Attouch, H, and Peypouquet, J. The Rate of Convergence of Nesterov's Accelerated Forward-Backward Method Is Actually Faster Than $1/k^2$. SIAM J Optim (2016). 26:1824–1834. doi:10.1137/15M1046095

CrossRef Full Text | Google Scholar

28. Petersen, HB, Bah, B, and Jung, P. Efficient Tuning-free L1-Regression of Nonnegative Compressible Signals. (2020). Available from: https://arxiv.org/abs/2003.13092. (Accessed May 10, 2021).

Google Scholar

29. Kümmerle, C. Understanding and Enhancing Data Recovery Algorithms- From Noise-Blind Sparse Recovery to Reweighted Methods for Low-Rank Matrix Optimization. [PhD dissertation]. München (Germany): Technical University of Munich (2019).

30. Grant, M, and Boyd, S. CVX: Matlab Software for Disciplined Convex Programming (2014).

31. Grant, MC, and BOYd, SP. Graph Implementations for Nonsmooth Convex Programs. In: V Blondel, S Boyd, and H Kimura, editors. Recent Advances in Learning and Control. Lecture Notes in Control and Information Sciences. New York, NY: Springer-Verlag Limited (2008). p. 95–110.

Google Scholar

32. Dirksen, S, Lecué, G, and Rauhut, H. On the Gap between Restricted Isometry Properties and Sparse Recovery Conditions. IEEE Trans Inf TheorY (2018). 64:5478–87. doi:10.1109/TIT.2016.2570244

CrossRef Full Text | Google Scholar

33. Polyak, BT. Introduction to Optimization (Translations Series in Mathematics and Engineering). New York, NY: Optimization Software, Inc (1987).

34. Van Den Berg, E, and Friedlander, MP. SPGL1: A Solver for Large-Scale Sparse Reconstruction (2019).

35. Dorfman, R. The Detection of Defective Members of Large Populations. Ann Math Statist 14 (1943). 436–40. doi:10.1214/aoms/1177731363

CrossRef Full Text | Google Scholar

36. Aldridge, M, Johnson, O, and Scarlett, J. Group Testing: An Information Theory Perspective. FNT Commun Inf Theory (2019). 15:196–392. doi:10.1561/0100000099

CrossRef Full Text | Google Scholar

37. Petersen, HB, and Jung, P. Robust Instance-Optimal Recovery of Sparse Signals at Unknown Noise Levels(Preprint) (2020). Available from: https://arxiv.org/abs/2008.08385.

6 Appendix

6.1 Proof of Non-Negative Least Residual Recovery Guarantee

By 1 we denote the all ones vector in N or M respectively. The proof is an adaption of the steps used in [13]. As for most convex optimization problems in compressed sensing we use ([3], Theorem 4.25) and [ [3], Theorem 4.20] respectively, which require A to have the RNSP.

Theorem 6.1 ([3], Theorem 4.25) and ([3], Theorem 4.20)). Let q[1,) and suppose A has the q-RNSP of order S with respect to with constants ρ and τ. Then, it holds that

xzq(1+ρ)21ρS1/q1(z1x1+2d1(x,ΣS))+3+ρ1ρτA(xz)forallx,zn.

If q=1, this bound can be improved to

xz11+ρ1ρ(z1||x||1+2d1(x,ΣS))+21ρτA(xz)forallx,zn.

Note that by a modification of the proof this result also holds for q=. The modifications on the proofs of ([3], Theorem 4.25) and ([3], Theorem 4.20) are straight forward, only the modification of ([3], Theorem 2.5) might not be obvious. See also [37]. As a consequence, all our statements also hold for q= with 1/q:=0. If WN×N is a diagonal matrix, we can calculate some operator norms fairly easy:

Wqq:=sup||w||q1||Ww||q=maxn[N]|Wn,n|forallq[1,].

We use this relation at several places throughout this section. Furthermore, we use ([13], Lemma 5) without adaption. For the sake of completeness we add a short proof.

Lemma 6.2 ([13], Lemma 5). Let q[1,) and suppose that AM×N has q-RNSP of order S with respect to with constants ρ and τ. Let WN×N be a diagonal matrix with Wn,n>0. If ρ=WqqW111ρ<1, then AW1 has q-RNSP of order S with respect to with constants ρ=||W||qqW1ppρ and τ=||W||qqτ.

Proof Let vN and #(T)S. If we apply the RNSP of A for the vector (W1v)|T, we get

|||v|T||q=WW1(|v|T)qWqqW1(|v|T)q=||W||qq(W1v)|TqWqq(ρS1q1||W1(v|Tc)||1+τAW1v)=||W||qqρS1q1||W1(v|Tc)||1+||W||qqτ||AW1v||WqqW111ρS1q1v|Tc1+Wqqτ||AW1v||.

This finishes the proof.

Next we adapt ([13], Theorem 4) to account for arbitrary norms. Further, we obtain a slight improvement in form of the dimensional scaling constant S1/q1. With this, our error bound becomes for S asymptotically the error bound of the basis pursuit denoising, whenever κ=1 and q>1 [3].

Proposition 6.3 (Similar to ([13], Theorem 4)). Let q[1,) and be a norm on M with dual norm *. Suppose A has q-RNSP of order S with respect to with constants ρ and τ. Suppose A has the M+ criterion with vector t and constant κ and that κρ<1. Then, we have

xzq2(1+κρ)21κρκS1/q1d1(x,ΣS)+((1+κρ)21κρS1/q1maxn[N]|(ATt)n1|t*+3+κρ1κρκτ)||AzAx||forallx,z+N.

If q=1, this bound can be improved to

||xz||q21+κρ1κρκd1(x,ΣS)+(1+κρ1κρmaxn[N]|(ATt)n1|||t||*+21κρκτ)×||AzAx||forallx,z+N.

Proof Let x,z0. In order to apply Lemma 6.2 we set W as the matrix with diagonal ATt and zero else. It follows that Wn,n>0 and WqqW111ρ=κρ<1. We can apply Lemma 6.2, which yields that AW1 has q-RNSP with constants ρ=Wqq||W1||11ρ=κρ and τ=Wqqτ=maxn[N]|(ATt)n|τ. We apply Theorem 6.1 with the matrix AW1, the vectors Wx, Wz and the constants ρ and τ and get

||WxWz||q(1+ρ)21ρS1/q1(||Wz||1Wx1+2d1(Wx,ΣS))+3+ρ1ρτ||AW1(WxWz)||(1+ρ)21ρS1/q1(||Wz||1||Wx||1+2W11d1(x,ΣS))+3+ρ1ρτ||AxAz||=2(1+κρ)21κρmaxn[N]|(ATt)n|S1/q1d1(x,ΣS)+(1+κρ)21κρS1/q1(Wz1||Wx||1)+3+κρ1κρmaxn[N]|(ATt)n|τAxAz.

We lower bound the left hand side further to get

xzq||W1||qqWxWzq=maxn[N]|(ATt)n1|||WxWz||q2(1+κρ)21κρκS1/q1d1(x,ΣS)+(1+κρ)21κρS1/q1maxn[N]|(ATt)n1|(||Wx||1Wz1)+3+κρ1κρκτAxAz.(5)

We want to estimate the term Wx1Wz1 using the M+ criterion. Since z,x0, Wn,n=(ATt)n>0 and W is a diagonal matrix, we have

||Wz||1||Wx||1=1,Wz1,Wx=WT1,zx=W1,zx=t,A(zx)tAzAx.

Applying this to Eq. 5 we get

||xz||q2(1+κρ)21κρκS1/q1d1(x,ΣS)+((1+κρ)21κρS1/q1maxn[N]|(ATt)n1|t*+3+κρ1κρκτ)||AzAx||.

If q=1 we can repeat the proof with the improved bound of Theorem 6.1.

After these auxiliary statements it remains to prove the main result of Section 3 about the properties of the NNLR minimizer.

Proof of Theorem 3.4. By applying Proposition 6.3 with x and z:=x#0 we get

xx#q2(1+κρ)21κρκS1/q1d1(x,ΣS)+((1+κρ)21κρS1/q1maxn[N]|(ATt)n1|||t||*+3+κρ1κρκτ)||Ax#Ax||2(1+κρ)21κρκS1/q1d1(x,ΣS)+((1+κρ)21κρS1/q1maxn[N]|(ATt)n1|t*+3+κρ1κρκτ)(Ax#y+||Axy||)2(1+κρ)21κρκS1/q1d1(x,ΣS)+2((1+κρ)21κρS1/q1maxn[N]|(ATt)n1|t*+3+κρ1κρκτ)||Axy||,

where in the last step we used that x# is a minimizer and x is feasible. If q=1, we can repeat the proof with the improved bound of Proposition 6.3.

Keywords: compressed sensing, compressible, sparse, non-negative, regression, tuning-free, expander, uniform

Citation: Petersen HB, Bah B and Jung P (2021) Efficient Tuning-Free l1-Regression of Nonnegative Compressible Signals. Front. Appl. Math. Stat. 7:615573. doi: 10.3389/fams.2021.615573

Received: 09 October 2020; Accepted: 16 April 2021;
Published: 07 june 2021.

Edited by:

Qiyu Sun, University of Central Florida, United States

Reviewed by:

Guohui Song, Old Dominion University, United States
Yunlong Feng, University at Albany, United States

Copyright © 2021 Petersen, Bah and Jung. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Hendrik Bernd Petersen, petersen@tu-berlin.de

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.