Skip to main content

ORIGINAL RESEARCH article

Front. Phys., 13 July 2022
Sec. Statistical and Computational Physics
This article is part of the Research Topic Tensor Network Approaches for Quantum Many-body Physics and Machine Learning View all 5 articles

A Hybrid Norm for Guaranteed Tensor Recovery

Yihao LuoYihao Luo1Andong Wang,
Andong Wang1,2*Guoxu Zhou,Guoxu Zhou1,3Qibin ZhaoQibin Zhao2
  • 1School of Automation, Guangdong University of Technology, Guangzhou, China
  • 2RIKEN AIP, Tokyo, Japan
  • 3Key Laboratory of Intelligent Detection and the Internet of Things in Manufacturing, Ministry of Education, Guangzhou, China

Benefiting from the superiority of tensor Singular Value Decomposition (t-SVD) in excavating low-rankness in the spectral domain over other tensor decompositions (like Tucker decomposition), t-SVD-based tensor learning has shown promising performance and become an emerging research topic in computer vision and machine learning very recently. However, focusing on modeling spectral low-rankness, the t-SVD-based models may be insufficient to exploit low-rankness in the original domain, leading to limited performance while learning from tensor data (like videos) that are low-rank in both original and spectral domains. To this point, we define a hybrid tensor norm dubbed the “Tubal + Tucker” Nuclear Norm (T2NN) as the sum of two tensor norms, respectively, induced by t-SVD and Tucker decomposition to simultaneously impose low-rankness in both spectral and original domains. We further utilize the new norm for tensor recovery from linear observations by formulating a penalized least squares estimator. The statistical performance of the proposed estimator is then analyzed by establishing upper bounds on the estimation error in both deterministic and non-asymptotic manners. We also develop an efficient algorithm within the framework of Alternating Direction Method of Multipliers (ADMM). Experimental results on both synthetic and real datasets show the effectiveness of the proposed model.

1 Introduction

Thanks to the rapid progress of computer technology, data in tensor format (i.e., multi-dimensional array) are emerging in computer vision, machine learning, remote sensing, quantum physics, and many other fields, triggering an increasing need for tensor-based learning theory and algorithms [16]. In this paper, we carry out both theoretic and algorithmic research studies on tensor recovery from linear observations, which is a typical problem in tensor learning aiming to learn an unknown tensor when only a limited number of its noisy linear observations are available [7]. Tensor recovery finds applications in many industrial circumstances where the sensed or collected tensor data are polluted by unpredictable factors such as sensor failures, communication losses, occlusion by objects, shortage of instruments, and electromagnetic interferences [79], and is thus of both theoretical and empirical significance.

In general, reconstructing an unknown tensor from only a small number of its linear observations is hopeless, unless some assumptions on the underlying tensor are made [9]. The most commonly used assumption is that the underlying tensor possesses some kind of low-rankness which can significantly limit its degree of freedom, such that the signal can be estimated from a small but sufficient number of observations [7]. However, as a higher-order extension of matrix low-rankness, the tensor low-rankness has many different characterizations due to the multiple definitions of tensor rank, e.g., the CANDECOMP/PARAFAC (CP) rank [10], Tucker rank [11], Tensor Train (TT) rank [12], and Tensor Ring (TR) rank [13]. As has been discussed in [7] from a signal processing standpoint, the above exampled rank functions are defined in the original domain of the tensor signal and may thus be insufficient to model low-rankness in the spectral domain. The recently proposed tensor low-tubal-rankness [14] within the algebraic framework of tensor Singular Value Decomposition (t-SVD) [15] gives a kind of complement to it by exploiting low-rankness in the spectral domain defined via Discrete Fourier Transform (DFT), and has witnessed significant performance improvements in comparison with the original domain-based low-rankness for tensor recovery [6, 16, 17].

Despite the popularity of low-tubal-rankness, the fact that it is defined solely in the spectral domain also naturally poses a potential limitation on its usability to some tensor data that are low-rank in both spectral and original domains. To address this issue, we propose a hybrid tensor norm to encourage low-rankness in both spectral and original domains at the same time for tensor recovery in this paper. Specifically, the contributions of this work are four-fold:

• To simultaneously exploit low-rankness in both spectral and original domains, we define a new norm named T2NN as the sum of two tensor nuclear norms induced, respectively, by the t-SVD for spectral low-rankness and Tucker decomposition for original domain low-rankness.

• Then, we apply the proposed norm to tensor recovery by formulating a new tensor least squares estimator penalized by T2NN.

• Statistically, the statistical performance of the proposed estimator is analyzed by establishing upper bounds on the estimation error in both deterministic and non-asymptotic manners.

• Algorithmically, we propose an algorithm based on ADMM to compute the estimator and evaluate its effectiveness on three different types of real data.

The rest of this paper proceeds as follows. First, the notations and preliminaries of low-tubal-rankness and low-Tucker-rankness are introduced in Section 2. Then, we define the new norm and apply it to tensor recovery in Section 3. To understand the statistical behavior of the estimator, we establish an upper bound on the estimation error in Section 4. To compute the proposed estimator, we design an ADMM-based algorithm in Section 5 with empirical performance reported in Section 6.

2 Notations and Preliminaries

Notations. We use lowercase boldface, uppercase boldface, and calligraphy letters to denote vectors (e.g., v), matrices (e.g., M), and tensors (e.g., T), respectively. For any real numbers a, b, let ab = max{a, b} and ab = min{a, b}. If the size of a tensor is not given explicitly, then it is in Rd1×d2×d3. We use c, c′, c1, etc., to denote constants whose values can vary from line to line. For notational simplicity, let d̃=(d1+d2)d3 and d\k = d1d2d3/dk for k = 1, 2, 3.

Given a matrix MCd1×d2, its nuclear norm and spectral norm are defined as ‖M‖iσi and Mmaxiσi, respectively, where {σi |i = 1, 2, … , d1d2} are its singular values. Given a tensor TRd1×d2×d3, define its l1-norm and F-norm as T1vec(T)1,TFvec(T)2, respectively, where vec (⋅) denotes the vectorization operation of a tensor [18]. Given TRd1×d2×d3, let T(i)T(:,:,i) denote its ith frontal slice. For any two (real or complex) tensors A,B of the same size, define their inner product as the inner product of their vectorizations A,Bvec(A),vec(B). Other notations are introduced at their first appearance.

2.1 Spectral Rankness Modeled by t-SVD

The low-tubal-rankness defined within the algebraic framework of t-SVD is a typical example to characterize low-rankness in the spectral domain. We give some basic notions about t-SVD in this section.

Definition 1 (t-product [15]). Given T1Rd1×d2×d3 and T2Rd2×d4×d3, their t-product T=T1T2Rd1×d4×d3 is a tensor whose (i, j)-th tube T(i,j,:)=k=1d2T1(i,k,:)T2(k,j,:), whereis the circular convolution [15].

Definition 2 (tensor transpose [15]). Let T be a tensor of size d1 × d2 × d3, then T is the d2 × d1 × d3 tensor obtained by transposing each of the frontal slices and then reversing the order of transposed frontal slices 2 through d3.

Definition 3 (identity tensor [15]). The identity tensor IRd×d×d3 is a tensor whose first frontal slice is the d × d identity matrix and all other frontal slices are zero.

Definition 4 (f-diagonal tensor [15]). A tensor is called f-diagonal if each frontal slice of the tensor is a diagonal matrix.

Definition 5 (Orthogonal tensor [15]). A tensor QRd×d×d3 is orthogonal if QQ=QQ=I.Then, t-SVD can be defined as follows.

Definition 6 (t-SVD, tubal rank [15]). Any tensor TRd1×d2×d3 has a tensor singular value decomposition as

T=USV,(1)

where URd1×d1×d3, VRd2×d2×d3 are orthogonal tensors and SRd1×d2×d3 is an f-diagonal tensor. The tubal rank of T is defined as the number of non-zero tubes of T,

ranktb(T)#iS(i,i,:)0,(2)

where # counts the number of elements in a set.For convenience of analysis, the block diagonal matrix of 3-way tensors is also defined.

Definition 7 (block-diagonal matrix [15]). Let T̄ denote the block-diagonal matrix of the tensor T̃ in the Fourier domain1, i.e.,

T̄T̃1T̃d3Cd1d3×d2d3.(3)

Definition 8 (tubal nuclear norm, tensor spectral norm [17]). Given TRd1×d2×d3, let T̃ be its Fourier version in Cd1×d2×d3. The Tubal Nuclear Norm (TNN) ‖⋅‖tnn of T is defined as the averaged nuclear norm of frontal slices of T̃,

Ttnn1d3i=1d3T̃(i),

whereas the tensor spectral norm ‖⋅‖ is the largest spectral norm of the frontal slices,

Tmaxi[d3]{T̃(i)}.

We can see from Definition 8 that TNN captures low-rankness in the spectral domain and is thus more suitable for tensors with spectral low-rankness. As visual data (like images and videos) often process strong spectral low-rankness, it has achieved superior performance over many original domain-based nuclear norms in visual data restoration [6, 17].

2.2 Original Domain Low-Rankness Modeled by Tucker Decomposition

The low-Tucker-rankness is a classical higher-order extension of matrix low-rankness in the original domain and has been widely applied in computer vision and machine learning [1921]. Given any K-way tensor TRd1×d2×d3, its Tucker rank is defined as the following vector:

rTucker(T)(rank(T(1)),,rank(T(K)))RK,(4)

where T(k)Rdk×ikdi denotes the mode-k unfolding (matrix) of T [18] obtained by concatenating all the mode-k fibers of T as column vectors. We can see that the Tucker rank measures the low-rankness of all the mode-k unfoldings T(k) in the original domain.

Through relaxing the matrix rank in Eq. 4 to its convex envelope, i.e., the matrix nuclear norm, we get a convex relaxation of the Tucker rank, called Sum of Nuclear Norms (SNN) [20], which is defined as follows:

Tsnnk=1KαkT(k),(5)

where αk’s are positive constants satisfying kak = 1. As a typical tensor low-rankness penalty in the original domain, SNN has found many applications in tensor recovery [19, 20, 22].

3 A Hybrid Norm for Tensor Recovery

In this section, we first define a new norm to exploit low-rankness in both spectral and original domains and then use it to formulate a penalized tensor least squares estimator.

3.1 The Proposed Norm

Although TNN has shown superior performance in many tensor learning tasks, it may still be insufficient for tensors which are low-rank in both spectral and original domains due to its definition solely in the spectral domain. Moreover, it is also unsuitable for tensors which have less significant spectral low-rankness than the original domain low-rankness. Thus, it is necessary to extend the vanilla TNN such that the original domain low-rankness can also be exploited for sounder low-rank modeling.

Under the inspiration of SNN’s impressive low-rank modeling capability in the original domain, our idea is quite simple: to combine the advantages of both TNN and SNN through their weighted sum. In this line of thinking, we come up with the following hybrid tensor norm.

Definition 9 (T2NN). The hybrid norm called “Tubal + Tucker” Nuclear Norm (T2NN) of any 3-way tensor TRd1×d2×d3 is defined as the weighted sum of its TNN and SNN as follows:

Tt2nnγTtnn+(1γ)Tsnn,(6)

where γ ∈ (0, 1) is a constant balancing the low-rank modeling in the spectral and original domains.As can be seen from its definition, T2NN approximates TNN when γ → 1, and it degenerates to SNN as γ → 0. Thus, it can be viewed as an interpolation between TNN and SNN, which provides with more flexibility in low-rank tensor modeling. We also define the dual norm of T2NN (named the dual T2NN norm) which are frequently used in analyzing the statistical performance of the T2NN-based tensor estimator.

Lemma 1. The dual norm of the proposed T2NN defined as

Tt2nnsupTX,T,s.t. Xt2nn1,(7)

can be equivalently formulated as follows:

Tt2nn=infA,B,C,Dmax1γA,1α1(1γ)B(1),1α2(1γ)C(2),1α3(1γ)D(3),s.t. A+B+C+D=T.(8)

Proof of Lemma 1. Using the definition of T2NN, the supremum in Problem (7) can be equivalently converted to the opposite number of infimum as follows:

Tt2nn=infTX,T,s.t. γXtnn+α1(1γ)X(1)+α2(1γ)×X(2)+α3(1γ)X(3)1.(9)

By introducing a multiplier λ ≥ 0, we obtain the Lagrangian function of Problem (9),

L(X,λ)X,T+λγXtnn+α1(1γ)X(1)+α2(1γ)X(2)+α3(1γ)X(3)1.

Since Slatter’s condition [23] is satisfied in Problem (9), strong duality holds, which means

Tt2nn=infXsupλL(X,λ)=supλinfXL(X,λ).

Thus, we proceed by computing supλinfXL(X,λ) as follows:

supλinfXX,T+λγXtnn+α1(1γ)X(1)+α2(1γ)X(2)+α3(1γ)X(3)1=(i)supλinfXX,A+B+C+D+λγXtnn+α1(1γ)X(1)+α2(1γ)X(2)+α3(1γ)X(3)1λ,whereA+B+C+D=T,=supλinfXλ+λγXtnnX,A+λα1(1γ)×X(1)X,B+λα2(1γ)X(2)X,C+λα3(1γ)X(3)X,D,whereA+B+C+D=T,=(ii)supλλ+0ifλ1γAotherwise+0ifλ1α1(1γ)B(1)otherwise+0ifλ1α2(1γ)C(2)otherwise+0ifλ1α3(1γ)D(3)otherwisewhereA+B+C+D=T,=infA+B+C+D=Tmax1γA,1α1(1γ)×B(1),1α2(1γ)C(2),1α3(1γ)D(3),

where (i) is obtained by the trick of splitting T into four auxiliary tensors A,B,C,D for simpler analysis and (ii) holds because for any positive constant α, any norm f (⋅) with dual norm f*(⋅), we have the following relationship:

infXλαf(X)X,AinfXλαf(X)γf(X)1αf(A),=infXαf(X)λ1αf(A),=0,ifλ1γf(A),,otherwise.

This completes the proof.Although an expression of the dual T2NN norm is given in Lemma 1, it is still an optimization problem whose optimal value cannot be straightforwardly computed from the variable tensor T. Following the tricks in [22], we instead give an upper bound on the dual T2NN norm which is directly in terms of T in the following lemma:

Lemma 2. The dual T2NN norm can be upper bounded as follows:

Tt2nn1161γT+1α1(1γ)T(1)+1α2(1γ)T(2)+1α3(1γ)T(3).(10)

Proof of Lemma 2. The proof is a direct application of the basic equality “harmonic mean ≤ arithmetic mean” with careful construction of auxiliary tensors A,B,C,D in Eq. 8 as follows:

A0=γT1M,B0=α1(1γ)T(1)1M,C0=α2(1γ)T(2)1M,D0=α3(1γ)T(3)1M,

where the denominator M is given by

M=γT1+α1(1γ)T(1)1+α2(1γ)T(2)1+α3(1γ)T(3)1.

It is obvious that A0+B0+C0+D0=T. By substituting the particular setting (A0,B0,C0,D0) of (A,B,C,D) into Eq. 8, we obtain

Tt2nn1γT1+α1(1γ)T(1)1+α2(1γ)T(2)1+α3(1γ)T(3)1.(11)

Then, by using “harmonic mean ≤ arithmetic mean” on the right-hand side of Eq. 11, we obtain

4γT1+α1(1γ)T(1)1+α2(1γ)T(2)1+α3(1γ)T(3)1141γT+1α1(1γ)T(1)+1α2(1γ)T(2)+1α3(1γ)T(3),(12)

which directly leads to Eq. 10.

3.2 T2NN-Based Tensor Recovery

3.2.1 The observation Model

We use LRd1×d2×d3 to denote the underlying tensor which is unknown. Suppose one observes Nd1d2d3 scalars,

yi=L,Xi+σξi,i[N],(13)

where Xi’s are known (deterministic or random) design tensors, ξi’s are i. i.d. standard Gaussian noises, and σ is a known standard deviation constant measuring the noise level.

Let y=(y1,,yN) and ξ=(ξ1,,ξN) denote the collection of observations and noises. Define the design operator X() with adjoint operator X() as follows:

TRd1×d2×d3,X(T)T,X1,,T,XNRN,zRN,X(z)i=1NziXiRd1×d2×d3.(14)

Then, the observation model (13) can be rewritten in the following compact form:

y=X(L)+ξ.

3.2.2 Two Typical Settings

With different settings of the design tensors {Xi}, we consider two classical examples in this paper:

Tensor completion. In tensor completion, the design tensors {Xi} are i. i.d. random tensor bases drawn from uniform distribution on the canonical basis in the space of d1 × d2 × d3 tensors eiejek,(i,j,k)[d1]×[d2]×[d3], where ei denotes the vector whose ith entry is 1 with all the other entries 0 and ◦ denotes the tensor outer product [18].

Tensor compressive sensing. When X is a random Gaussian design, Model (13) is the tensor compressive sensing model with Gaussian measurements [24]. X is named a random Gaussian design when {Xi} are random tensors with i. i.d. standard Gaussian entries [22].

3.2.3 The Proposed Estimator

The goal of this paper is to recover the unknown low-rank tensor L from noisy linear observations y satisfying the observation model (13).

Inspired by the capability of the newly defined T2NN in simultaneously modeling low-rankness in both spectral and original domains, we define the T2NN penalized least squares estimator L̂ to estimate the unknown truth L,

L̂argminL12yX(L)22+λLt2nn,(15)

where the squared l2-norm is adopted as the fidelity term for Gaussian noises, the proposed T2NN is used to impose both spectral and original low-rankness in the solution, and λ is a penalization parameter which balances the residual fitting accuracy and the parameter complicity (characterized by low-rankness) of the model.

Given the estimator L̂ in Eq. 15, one may naturally ask how well it can estimate the truth L and how to compute it. In the following two sections, we first study the estimation performance of L̂ by upper bounding its estimation error and then develop an ADMM-based algorithm to efficiently compute it.

4 Statistical Guarantee

In this section, we first come up with a deterministic upper bound of the estimation error and then establish non-asymptotic error bounds for the special cases of tensor compressive sensing with random Gaussian design and noisy tensor completion.

First, to describe the low-rankness of L, we consider both its low-tubal-rank and low-Tucker-rank structures as follows:

• `Low-tubal-rank structure: Let rtb denote the tubal rank of L. Suppose it has reduced t-SVD L=USV, where URd1×rtb×d3,VRd2×rtb×d3 are orthogonal tensors and SRrtb×rtb×d3 is f-diagonal. Then, following [25], we define the following projections of any tensor TRd1×d2×d3:

P(T)(IUU)T(IVV)andP(T)=TP(T)(16)

where I denotes the identity tensor of appropriate dimensionality.

Low-Tucker-rank structure: Let rtk=(r1,r2,r3) denote the Tucker rank of L, i.e., rk=rank(L(k)). Then, we have the reduced SVD factorization T(k)=UkSkVk, where UkRdk×rk and VkR(d\k)×rk are orthogonal and SkRrk×rk is diagonal. Let TRd1×d2×d3 be an arbitrary tensor. Similar to [22], we define the following two projections for any mode k = 1, 2, 3:

Pk(T)=(IUkUk)T(k)(IVkVk)andPk(T)=T(k)Pk(T),(17)

where I denotes the identity matrix of appropriate dimensionality.

4.1 A Deterministic Bound on the Estimation Error

Before bounding the Frobenius-norm error L̂LF, the particularity of the error tensor ΔL̂L is first characterized by a certain choice of regularization parameter λ involving the dual T2NN norm in the following proposition.

Proposition 1. By setting the regularization parameter λ2σX(ξ)t2nn, we have

(I) rank inequality:

ranktb(P(Δ))2rtb,andrank(Pk(Δ))2rk,k=1,2,3,(18)

(II) sum of norms inequality:

γP(Δ)tnn+(1γ)k=13αkPk(Δ)3γP(Δ)tnn+(1γ)k=13αkPk(Δ),(19)

(III) an upper bound on the “observed” error:

X(Δ)223γ2rtb+(1γ)k=13αk2rkΔF.(20)

Proof of Proposition 1. The proof is given as follows:Proof of Part (I): According to the definition of P(T) in Eq. 16, we have

P(T)=TP(T)=UUT+TVVUUTVV,=UUT+(IUU)TVV.

Due to the facts that ranktb(AB)max{ranktb(A),ranktb(B)}, ranktb(A+B)ranktb(A)+ranktb(B) [26], and ranktb(U)=ranktb(V)=rtb, we have

ranktb(P(T))ranktb(UUT)+ranktb((IUU)TVV)2rtb.

Also, according to the definition of P(T) in Eq. 17, we have

Pk(T)=T(k)Pk(T)=UkUkT(k)+T(k)VkVkUkUkT(k)VkVk=UkUkT(k)+(IUkUk)×T(k)VkVk.

Due to the facts that rank(AB)max{rank(A),rank(B)}, rank(A+B)rank(A)+rank(B) [26], and rank(Uk)=rank(Vk)=rk, we have

rank(Pk(T))rank(UkUkT(k))+rank((IUkUk)T(k)VkVk)2rk.

Proof of Part (II) and Part (III): The optimality of L̂ to Problem Eq. 15 indicates

12yX(L̂)22+λL̂t2nn12yX(L)22+λLt2nn.

By the definition of the error tensor ΔL̂L, we can get X(L̂)=X(L)+X(Δ), which leads to

12yX(L)X(Δ)2212yX(L)22λLt2nnλL̂t2nn.

The definition that σξ=yX(L) yields

12X(Δ)22X(Δ),σξ+λ(Lt2nnL̂t2nn)X(ξ),Δ+λ(Lt2nnL̂t2nn),

where the last inequality holds due to the definition of the adjoint operator X().According to the definition and upper bound of the dual T2NN norm in Lemma 1 and Lemma 2, we obtain

12X(Δ)22σX(ξ)t2nnΔt2nn+λ(Lt2nnL̂t2nn).(21)

According to the decomposibility of TNN (see the supplementray material of [25]) and the decomposibility of matrix nuclear norm [27], one has

LtnnL̂tnn=LtnnL+Δtnn=LtnnL+P(Δ)+P(Δ)LtnnL+P(Δ)tnnP(Δ)tnn=LtnnLtnn+P(Δ)tnnP(Δ)tnn=P(Δ)tnnP(Δ)tnn

and

L(k)L̂(k)=L(k)L(k)+Δ̂(k)=L(k)L(k)+Pk(Δ)+Pk(Δ)L(k)L(k)+Pk(Δ)Pk(Δ)=L(k)L(k)+Pk(Δ)Pk(Δ)=Pk(Δ)Pk(Δ).

Then, we obtain

Lt2nnL̂t2nnγP(Δ)tnn+(1γ)k=13αkPk(Δ)γP(Δ)tnn+(1γ)k=13αkPk(Δ).(22)

Using the definition of T2NN and triangular inequality yields

Δt2nnγP(Δ)tnn+(1γ)k=13αkPk(Δ)+γP(Δ)tnn+(1γ)k=13αkPk(Δ).(23)

Further using the setting λ2σX(mξ)t2nn yields Part (III),

12X(Δ)22(i)3λ2γP(Δ)tnn+(1γ)k=13αkPk(Δ)λ2γP(Δ)tnn+(1γ)k=13αkPk(Δ)3λ2γP(Δ)tnn+(1γ)k=13αkPk(Δ)(ii)3λ2γ2rtbP(Δ)F+(1γ)k=13αk2rkPk(Δ)F(iii)3λ2γ2rtbΔF+(1γ)k=13αk2rkΔF=3λ2γ2rtb+(1γ)k=13αk2rkΔF,

where by combing (i) and X(Δ)220, Part (II) can be directly proved; inequality (ii) holds due to the compatible inequality of TNN and matrix nuclear norm, i.e., Ttnnranktb(T)TF [25] and Trank(T)TF [27], and Iiequality (iii) holds because one can easily verify the facts that P(Δ)F2=ΔF2P(Δ)F2ΔF2 [25] and Pk(Δ)F2=Δ(k)F2Pk(Δ)F2Δ(k)F2=ΔF2 [27].Note that inequality (20) gives an upper bound on the X(Δ)2, which can be seen as the “observed” error. However, we are more concerned about upper bounds on the error itself ‖Δ‖F rather than its observed version. The following assumption builds a bridge between X(Δ)2 and ‖Δ‖F.

Assumption 1 (RSC condition). The observation operator X() is said to satisfy the Restricted Strong Convexity (RSC) condition with parameter κ if the following inequality holds:

X(T)22κTF2,(24)

for any TRd1×d2×d3 belong to the restricted direction set,

CTγP(T)tnn+(1γ)k=13αkPk(T)3γP(T)tnn+(1γ)k=13αkPk(T).(25)

Then, a straightforward combination of Proposition 1 and Assumption 1 leads to an deterministic bound on the estimation error.

Theorem 1. By setting the regularization parameter λ2σX(ξ)t2nn, we have the following error bound for any solution L̂ to Problem (15):

LL̂F32κλγrtb+(1γ)k=13αkrk.(26)

Note that we do not require information for distribution of the noise ξ in Theorem 1, which indicates that Theorem 1 provides a deterministic bound for general noise type. The bound on the right-hand side of Eq. 26 is in terms of the quantity,

γrtb+(1γ)k=13αkrk,

which serves as a measure of structure complexity, reflecting the natural intuition that more complex structure causes larger error. The result is in consistent with the results for sum-of-norms-based estimators in [5, 22, 24, 28]. A more general analysis in [24, 28] indicates that the performance of sum-of-norms-based estimators are determined by all the structural complexities for a simultaneously structured signal, just as shown by the proposed bound (26).

4.2 Tensor Compressive Sensing

In this section, we consider tensor compressive sensing from random Gaussian design where {Xi}’s are random tensors with i. i.d. standard Gaussian entries [22]. First, the RSC condition holds in random Gaussian design as shown in the following lemma.

Lemma 3 (RSC of random Gaussian design). If X():Rd1×d2×d3RN is a random Gaussian design, then a version of the RSC condition is satisfied with probability at least 1–2 exp(−N/32) as follows:

X(Δ)N4ΔF116d1d3+d2d3γ+k=13dk+d\kαk(1γ)Δt2nn,(27)

for any tensor ΔRd1×d2×d3 in the restricted direction set C whose definition is given in Eq. 25.Proof of Lemma 3. The proof is analogous to that of Proposition 1 in [27]. The difference lies in how we lower bound the right hand side of (H.7) in [27], i.e.,

EinfΘR(t)supuSN1Yu,Θ,(27a)

where SN1vRNv2=1, R(t)={ΘRd1×d2×d3ΘF=1,Θt2nnt}, and

Yu,Θg,u+G,Θ,

where random vector gRN and random tensor GRd1×d2×d3 are independent with i. i.d. N(0,1) entries.We bound the quantity in Eq. 27 as follows:

EinfΘR(t)supuSN1Yu,Θ=EsupuSN1g,u+EinfΘR(t)G,Θ=Eg2EsupΘR(t)G,Θ=12NtEGt2nn,(28)

where Gt2nn can be bounded according to Lemma 4. The rest of the proof follows that of Proposition 1 in [27].The remaining bound on X(ξ)t2nn is shown in the following Lemma.

Lemma 4 (bound on X(ξ)t2nn). Let X:Rd1×d2×d3RN be a random Gaussian design. With high probability, the quantity X(ξ)t2nn is concentrated around its mean, which can be bounded as follows:

E[X(ξ)t2nn]cNd1d3+d2d3γ+k=13dk+dkαk(1γ).(29)

Proof. Since ξm’s are i. i.d. N(0,1) variables, we have

ξ22N(29a)

with high probability according to Proposition 8.1 in [29].For k = 1, 2, 3, let X(ξ) (k) be the mode-k unfolding of random tensor X(ξ). A direct use of Lemma C.1 in [27] leads to

EX(ξ)(k)c0Ndk+d\k(30)

with high probability. A similar argument of Lemma C.1 in [27] also yields

EX(ξ)c1N(d1d3+d2d3)(31)

with high probability. Combining Eqs 30, 31, we can complete the proof.Then, the non-asymptotic error bound is obtained finally as follows.

Theorem 2 (non-asymptotic error bound). Under the random Gaussian design setup, there are universal constants c3, c4, and c5 such that for a sample size N greater than

c3d1d3+d2d3γ+k=13dk+d\kαk(1γ)2×γrtb+(1γ)k=13αkrk2

and any solution to Problem (15) with regularization parameter

λ=c4σNd1d3+d2d3γ+k=13dk+d\kαk(1γ),

then we have

ΔF2c5σ2d1d3+d2d3γ+k=13dk+d\kαk(1γ)2γrtb+(1γ)k=13αkrk2N,(32)

which holds with high probability.To understand the proposed bound, we consider the three-way cubical tensor LRd×d×d with regularization weights γ = (1 − γ)α1 = (1 − γ)α2 = (1 − γ)α3 = 1/4. Then, the bound in Eq. 52 is simplified to the following element-wise error:

L̂LF2d3Oσ21Nrtbd+k=13rkd2,(33)

which means the estimation error is controlled by the tubal rank and Tucker rank of L simultaneously. From the right-hand side of Eq. 33, it can be seen that the more observations (i.e., the larger N), the smaller the error; it is also reflected that larger tensors with more complex structures will lead to larger errors. The interpretation is consistent with our intuition.Equation 33 also indicates the sample size N should satisfy

NΩrtb+k=13rk2d2(34)

for approximate tensor sensing.Another interesting result is that by setting the noise level σ = 0 in Eq. 33, the upper bound reaches 0, which means the proposed estimator can exactly recover the unknown truth L in the noiseless setting.

4.3 Noisy Tensor Completion

For noisy tensor completion, we consider a slightly modified estimator,

L̂argminLa12yX(L)22+λLt2nn,(35)

where ma>0 is a known constant constraining the magnitude of entries in L. The constraint La is very mild because real signals are all of limited magnitude, e.g., the intensity of pixels in visual light images cannot be greater than 255. The constraint also provides with theoretical continence in excluding the “spiky” tensors while controlling the identifiability of L. Similar “non-spiky” constraints are also considered in related work [6, 16, 30].

We consider noisy tensor completion under uniform sampling in this section.

Assumption 2 (uniform sampling scheme). The design tensors {Xi} are i.i.d. random tensor bases drawn from uniform distribution Π on the set,

eiejek:(i,j,k)[d1]×[d2]×[d3].

Recall that Proposition 1 in Section 4.1 gives an upper bound on the “observed part” of the estimation error X(Δ)F. As our goal is to establish a bound on ‖Δ‖F, we then connect X(Δ)F with ‖Δ‖F by quantifying the probability of the following RSC property of the sampling operator X:

1NX(Δ)F212d1d2d3ΔF2aninterceptterm,

when the error tensor Δ belongs to some set mC(β,mr) defined as

C(β,r){ΔRd1×d2×d3Δ1,ΔF2d1d2d3β,Δt2nnγr+(1γ)k=13αkrkΔF},(36)

where β=64logd̃Nlog(6/5) is an F-norm tolerance parameter and mr = (r, r1, r2, r3) is a rank parameter whose values will be specified in the sequel.

Lemma 5 (RSC condition under uniform sampling). For any ΔC(β,r), it holds with probability at least 1(d1d3+d2d3)1 that

1NX(Δ)F2ΔF22d1d2d344d1d2d3N2EX(ϵ)t2nn2×γr+(1γ)k=13αkrk2,(37)

where e is the base of the natural logarithm, and the entries ϵi of vector ϵRN are i. i.d. Rademacher random variables.Before proving Lemma 5, we first define a subset of C(β,r) by upper bounding the F-norm of any element Δ in it,

B(r,T)Δ:ΔC(β,r),ΔF2d1d2d3T,

and a quantity

ZTsupΔB(r,T)X(Δ)22NΔF2d1d2d3,

which is the maximal absolute deviation of N1X(Δ)22 from its expectation (d1d2d3)1ΔF2 in B(r,T). Lemma 6 shows the concentration behavior of ZT.

Lemma 6 (concentration of ZT). There exists a constant c0 such that

PZT512T44d1d2d3N2EX(ϵ)2γr+(1γ)×kαkrk2exp(c0NT2).

Proof of Lemma 6. The proof is similar to that of Lemma 10 in [31]. The difference lies in the step of symmetrization arguments. Note that for any ΔB(r,T), it holds that

Δtnnd1d2d3Tγr+(1γ)kαkrk,

which indicates

E[ZT]8EsupΔB(r,T)X(ϵ),Δ8EsupΔB(r,T)1d3X(ϵ)Δtnn8d1d2d3TE[X(ϵ)]γr+(1γ)kαkrk.

Then, Lemma 5 can be proved by using the peeling argument [30].Proof of Lemma 5. For any positive integer l, we define disjoint subsets of C(β,r) as

D(r,l){Δ:ΔC(β,r),βρl1ΔF2d1d2d3βρl}

with constants ρ=65 and β=64logd̃Nlogρ. Let D = d1d2d3 for simplicity, and define the event

EΔC(β,r),s.t.X(Δ)22NΔF2DΔF22D+44DN2E2[X(ϵ)]γr+(1γ)kαkrk2

and its sub-events for any lN+,

ElΔD(r,l),s.t.X(Δ)22NΔF2D512βρl+44DN2E2[X(ϵ)]γr+(1γ)kαkrk2.

Note that Lemma 6 implies that

P[El]=PΔC(r,l)ZT512βρl+44DN2E2[X(ϵ)]×γr+(1γ)kαkrk2exp(c0Nβ2ρ2l).

Thus, we have

P[E]l=1P[El]l=1exp(c0Nβ2ρ2l)l=1exp(2c0Nβ2llogρ)exp(c0pβ2logρ)1exp(c0pβ2logρ).

Recall that β=64logd̃log(6/5)N, then P[E]2/d̃, which leads to the result of Lemma 5.Based on the RSC condition in Lemma 5, we are able to give an upper bound on the estimation error ‖Δ‖F in the following proposition.

Proposition 2. With parameter λ2σX(ξ)t2nn, the estimation error satisfies

ΔF2d1d2d3maxc1d1d3d3N2λ2+a2E[X(ϵ)]2×γrtb+(1γ)kαkrk2,c2a2logd̃N(38)

with probability at least 12(d1d3+d2d3)1.Proof of Proposition 2. A direct consequence of property (II) in Proposition 1 and the triangular inequality is that the error tensor Δ satisfies

Δt2nnγ32rtb+(1γ)k=13αk32rkΔF.(39)

Since L̂<a and L<a, we also have ΔL̂LL̂+L<2a.Let r=(rtb,r1,r2,r3) denote the rank complexity of the underlying tensor L. By discussing whether tensor Δ2a is in set C(β,32r), we consider the following cases.Case 1: If Δ2aC(β,32r), then from the definition of set C(β,r), we have

ΔF2d1d2d34a264logd̃log(6/5)N.(40)

Case 2: If Δ2maC(β,32r), then by Proposition 1 and Lemma 5, we have

Δ2aF22D4432DN2E2X(ϵ)×γrtb+(1γ)kαkrk232λNγrtb+(1γ)kαkrkΔ2aF(41)

with probability at least 12(d1d3+d2d3)1.By performing some algebra (like the proof of Theorem 3 in [30]), we have

ΔF2d1d2d3Cd1d2d3N2λ2+a2E2[X(ϵ)]×γrtb+(1γ)kαkrk2.(42)

Combining Case 1 and Case 2, we obtain the result of Proposition 2. According to Proposition 2 and Lemma 5, it remains to bound X(ξ)t2nn and E[X(ϵ)t2nn]. The following lemmas give their bounds respectively. As the noise variables {ξi} are i. i.d. standard Gaussian, it belongs to the sub-exponential distribution [32], and thus, there exists a constant ϱ as the smallest number satisfying [30]

maxiNEe|ξi|ϱe.(43)

Suppose the sample complexity N in noisy tensor completion satisfies

Nmax{d1d3d2d3,maxk(dkd\k)}N2(d1d2)ϱ2log2(ϱd1d2)log(d1d3+d2d3)Nmaxk32(dkd\k)ϱ2log2(ϱdkd\k)log(dk+d\k)N2(d1d2)log2(d1d2)log(d1d3+d2d3)Nmaxk32(dkd\k)log2(dkd\k)log(dk+d\k).(44)

Then, we have the following Lemma 7 and Lemma 8 to bound X(ξ)t2nn and E[X(ϵ)t2nn].

Lemma 7. Under the sample complexity of noisy tensor completion in Eq. 44, it holds with probability at least 1(d1d3+d2d3)1k(dk+d\k)1 that

X(ξ)t2nnCϱN1γlog(d1d3+d2d3)d1d2+1(1γ)×k=131αklog(dk+d\k)dkd\k,(45)

where Cϱ is a constant dependent on the ϱ that is defined in Eq. 43.Proof of Lemma 7. The proof can be straightforwardly obtained by adopting the upper bound of the dual T2NN norm in Lemma 2 and Lemma 5 in the supplementary material of [25], and Lemma 5 in [30] as follows:

• First, Lemma 5 in the supplementary material of [25] shows that letting Nd1d3d2d3 and N2(d1d2)ϱ2log2(ϱd1d2)log(d1d3+d2d3), then it holds with probability at least 1(d1d3+d2d3)1 that

X(ξ)CϱN(d1d2)1log(d1d3+d2d3).(46)

• For k = 1, 2, 3, let X(ξ) (k) be the mode-k unfolding of random tensor X(ξ). Then, Lemma 5 in [30] indicates that letting Ndk ∨ (d\k) and N2(dk(d\k))ϱ2log2(ϱdkd\k)log(dk+d\k), then it holds with probability at least 1(dk+d\k)1 that

X(ξ)(k)CϱN(dkd\k)1log(dk+d\k).(47)

Then, combining Eq. 46 and 47 and using union bound, Eq. 45 can be obtained.

Lemma 8. Under the sample complexity of noisy tensor completion in Eq. 44, it holds that

E[X(ϵ)t2nn]CN1γlog(d1d3+d2d3)d1d2+1(1γ)×k=131αklog(dk+d\k)dkd\k.(48)

Proof of Lemma 8. Similar to the proof of Lemma 7, the proof can be straightforwardly obtained by adopting the upper bound of the dual T2NN norm in Lemma 2 and Lemma 6 in the supplementary material of [25], and Lemma 6 in [30].

• First, Lemma 6 in the supplementary material of [25] shows that letting Nd1d3d2d3 and N ≥ 2 (d1d2) log2 (d1d2) log (d1d3 + d2d3), then, the following inequality holds:

E[X(ϵ)]C2N(d1d2)1log(d1d3+d2d3).(49)

• For k = 1, 2, 3, let X∗(ϵ) (k) be the mode-k unfolding of random tensor X(ϵ). Then, Lemma 6 in [30] indicates that letting Ndkd\k and N ≥ 2 (dkd\k) log2 (dkd\,k) log (dk + d\k), then, the following inequality holds:

E[X(ϵ)(k)]C2N(dkd\k)1log(dk+d\k).(50)

Then, Eq. 48 can be obtained by combining Eqs. 49 and 50.Further combining Lemma 7, Lemma 8, and Proposition 2, we arrive at an upper bound on the estimation error in the follow theorem.

Theorem 3. Suppose Assumption 2 is satisfied and La. Let the sample size N satisfies Eq. 44. By setting

λ=CϱσN1γlog(d1d3+d2d3)d1d2+1(1γ)k=131αklog(dk+d\k)dkd\k,(51)

the estimation error of any estimator L̂ defined in Problem (35) can be upper bounded as follows:

L̂LF2d1d2d3c2maxa2logd̃N,d1d2d3(σ2a2)N×γr+(1γ)kαkrk21γlog(d1d3+d2d3)d1d2+1(1γ)×k=131αklog(dk+d\k)dkd\k2.(52)

with probability at least 13(d1d3+d2d3)1k(dk+d\k)1.To understand the proposed bound in Theorem 3, we consider the three-way cubical tensor LRd×d×d with regularization weights γ = (1 − γ)α1 = (1 − γ)α2 = (1 − γ)α3 = 1/4. Then, the bound in Eq. 52 is simplified to the following element-wise error:

L̂LF2d3Od3N(σa)2rd+k=13rkd2logd,(53)

which means the estimation error is controlled by the tubal rank and Tucker rank of L simultaneously. Equation 53 also indicates that the sample size N should satisfy

NΩr+k=13rk2d2logd(54)

for approximate tensor completion.

5 Optimization Algorithm

The ADMM framework [33] is applied to solve the proposed model. Adding auxiliary variables K and T1,T2,T3 to Problem (15) yields an equivalent formulation,

minL,K,{Tk}k12yX(L)22+λγKtnn+λ(1γ)×k=13αkT(k)ks.t. K=L;Tk=L,k=1,2,3.(55)

To solve Problem (55), an ADMM-based algorithm is proposed. First, the augmented Lagrangian is

Lρ(L,K,{Tk}k,A,{Bk}k)=12yX(L)22+λγKtnn+A,KL+ρ2KLF2+λ(1γ)k=13αkT(k)k+Bk,TkL+ρ2TkLF2,(56)

where tensors A and {Bk}k are the dual variables.

The primal variables L,K, and Tk can be divided into two blocks: The first block has one tensor variable L, whereas the second block consists of four variables K and Tk’s. We use the minimization scheme of ADMM to update the two blocks alternatively after the tth iteration (t = 0, 1, ⋯):

Update the first block L: We update L by solving following L-subproblem with all the other variables fixed:

Lt+1=argminLLρ(L,Kt,{(Tk)t}k,At,{(Bk)t}k)=argminL12yX(L)22+ρ2Kt+ρ1AtLF2+k=13ρ2(Tk)t+ρ1(Bk)tLF2.

By taking derivative with respect to L and setting the derivative to zero, we obtain the following equation:

X(X(L)y)+ρLKtρ1At+k=13ρL(Tk)tρ1(Bk)t=0.

Solving the above equation yields

Lt+1=(XX+4ρI)1(X(y)+ρKt+At+k=13ρ(Tk)t+(Bk)t,(57)

where I() is the identity operator.

Update the second block (K,{Tk}): We update K and {Tk} in parallel by keeping all the other variables fixed. First, K is updated by solving the K-subproblem,

Kt+1=argminKLρ(Lt+1,K,{(Tk)t}k,At,{(Bk)t}k)=argminKλγKtnn+At,KLt+1+ρ2KLt+1F2=Proxρ1λγtnn(Lt+1ρ1At),(58)

where Proxτtnn() is the proximal operator of TNN given in Lemma 9.

Then, Tk is updated by solving the Tk-subproblem (k = 1, 2, 3),

(Tk)t+1=argminTkLρ(Lt+1,Kt,{(Tk)}k,At,{(Bk)t}k)=argminTkλαk(1γ)T(k)k+(Bk)t,TkLt+1+ρ2TkLt+1F2=FkProxρ1λαk(1γ)(L(k)t+1ρ1(B(k)k)t),(59)

where Fk():Rdk×d\kRd1×d2×d3 is the folding function to reshape a mode-k matricazation to its original tensor format and Proxτ() is the proximal operator of matrix nuclear norm given in Lemma 10.

Lemma 9 (proximal operator of TNN [34]). Let tensor T0Rd1×d2×d3 with t-SVD T0=USV, where URd1×r×d3 and VRd2×r×d3 are orthogonal tensors and SRr×r×d3 is the f-diagonal tensor of singular tubes. Then, the proximal operator of function ‖⋅‖tnn at point T0 with parameter τ can be computed as follows:

Proxτtnn(T0)argminT12T0TF2+τTtnn=Uifft3(max(fft3(S)τ,0))V,

where fft3() and ifft3() denote the operations of fast DFT and fast inverse DFT on all the tubes of a given tensor, respectively.

Lemma 10 (proximal operator of the matrix nuclear norm [35]). Let tensor T0Rd1×d2 with SVD T0 = USV, where URd1×r and VRd2×r are orthogonal matrices and SRr×r is a diagonal matrix of singular values. Then, the proximal operator of function ‖⋅‖ at point T0 with parameter τ can be computed as follows:

Proxτ(T0)argminT12T0TF2+τT=Umax(Sτ,0)V.

Update the dual variables (A,{Bk}). We use dual ascending [33] to update (A,{Bk}) as follows:

At+1=At+ρ(Kt+1Lt+1),(Bk)t+1=(Bk)t+ρ(Tk)t+1Lt+1,k=1,2,3.(60)

Termination Condition. Given a tolerance ϵ > 0, check the termination condition of primal variables

XtXtϵ,XL,K,{Tk},(61)

and convergence of constraints

KtLtϵ,and(Tk)tLtϵ,k=1,2,3.(62)

The ADMM-based algorithm is described in Algorithm 1.

Algorithm 1. ADMM for Problem(55)

www.frontiersin.org

Computational complexity analysis: We analyze the computational complexity as follows.

• By precomputing (I+XXXX)1 and XX, which costs O(d13d23d33+Nd12d22d32), the cost of updating L is O(d12d22d32).

• Updating K and Tk involves computing the proximal operator of TNN and NN, which costs Od1d2d3(d1d2+logd3+k=13dkd\k).

• Updating A and {Bk} (k = 1, 2, 3) costs O (d1d2d3).

Overall, supposing the iteration number is T, the total computational complexity will be

O(d13d23d33+Td12d22d32+Td1d2d3(d1d2+logd3+k=13dkd\k)),(63)

which is very expensive for large tensors. In some special cases (like tensor completion) where Xi,L operates on an element of L, (I+XXXX)1 and XX can be computed in O (d1d2d3). Hence, the total complexity of Algorithm 1 will drop to

OTd1d2d3(min{d1,d2}+logd3+k=13dkd\k).(64)

Convergence analysis: We then discuss the convergence of Algorithm 1 as follows.

Theorem 4 (convergence of Algorithm 1). For any positive constant ρ, if the unaugmented Lagrangian function L0(L,K,{Tk},A,{Bk}) has a saddle point, then the iterations Lρ(Lt,Kt,{(Tk)t},At,{(Bk)t}) in Algorithm 1 satisfy the residual convergence, objective convergence, and dual variable convergence (defined in [33]) of Problem (55) as t∞.Proof of Theorem 4. The key idea is to rewrite Problem(55) into a standard two-block ADMM problem. For notational simplicity, let

f(u)=12yX(L)22,g(v)=λγKtnn+λ(1γ)k=13αkT(k)k,

with u, v, w, and A defined as follows:

uvec(L)Rd1d2d3,vvec(K)vec(T1)vec(T2)vec(T3)R4d1d2d3,wvec(A)vec(B1)vec(B2)vec(B3)R4d1d2d3,AIDIDIDIDR4d1d2d3×d1d2d3,

where vec (⋅) denotes the operation of tensor vectorization (see [18]).It can be verified that f (⋅) and g (⋅) are closed, proper convex functions. Then, Problem(55) can be re-written as follows:

minu,vf(u)+g(v)s.t.  Auv=0.

According to the convergence analysis in [33], we have

objective convergence:limtf(ut)+g(vt)=f+g,dual variable convergence:limtwt=w,constraint convergence:limtAutvt=0,

where f, g are the optimal values of f(u), g(v), respectively. Variable w is a dual optimal point defined as

w=vec(A)vec(B1)vec(B2)vec(B3),

where (A,{Bk}k) are the dual variables in a saddle point (L,K,{(Tk)},A,{(Bk)}) of the unaugmented LagrangianL0(L,K,{Tk},A,{Bk}). Since there are only equality constraints in the convex problem(55), strong duality holds naturally as a corollary of Slater’s condition [23], which further indicates that the unaugmented Lagrangian L0(L,K,{Tk},A,{Bk}) has a saddle point. Moreover, according to the analysis in [36], the convergence rate of general ADMM-based algorithms is O (1/T), where T denotes the iteration number. In this way, the convergence behavior of Algorithm 1 is analyzed.

6 Experimental Results

In this section, we first conduct experiments on synthetic datasets to validate the theory for tensor compressed sensing and then evaluate the effectiveness of the proposed T2NN on three types of real data for noisy tensor completion. MATLAB implementations of the algorithms are deployed on a PC running UOS system with an AMD 3 GHz CPU and a RAM of 40 GB.

6.1 Tensor Compressed Sensing

Our theoretical results on tensor compressed sensing are validated on synthetic data in this subsection. Motivated by [7], we consider a constrained T2NN minimization model that is equivalent to Model (15) for the ease of parameter selection. For performance evaluation, the proposed T2NN is also compared with TNN-based tensor compressed sensing [37]. First, the underlying tensor LRd1×d2×d3 and its compressed observations {yi} are synthesized by the following tow steps, respectively:

Step 1: Generate L that is low-rank in both spectral and original domains. Given positive integers d1, d2, d3, and r ≤ min{d1, d2, d3}, we first generate TRd1×d2×r by T=G1G2, where G1Rd1×1×r and G2R1×d2×r are tensors with i. i.d. standard Gaussian entries. Then, let L=T×3G where ×3 is the tensor mode-3 product [18], and GRr×d3 is a matrix with i. i.d standard Gaussian entries. Our extensive numerical experimental results show that with high probability, the tubal rank and Tucker rank of L are all equal to r, that is, ranktb(L)=r and rank(L(k))=r,k=1,2,3.

Step 2: Generate N compressed observations {yi}. Given a positive integer ND, we first generate N design tensors {Xi} with i. i.d. standard Gaussian entries. Then, N noise variables {ξi} are generated as i. i.d. standard Gaussian variables. The parameter of standard deviation σ is set by σ = 0, where σ0=LF/d1d2d3, and we use c to denote the noise level. Finally, {yi} are formed according to the observation model (13). The goal of tensor compressed sensing is to reconstruct the known L from its noisy compressed observations {yi}.

For simplicity, we consider cubic tensors, i.e., d1 = d2 = d3 = d, and choose the parameter of T2NN by γ = 1/4, α1 = α2 = α3 = 1/3. Recall that the underlying tensor LRd×d×d generated by the above Step 1 has the tubal rank and Tucker rank all equal to r with high probability. We consider tensors with dimensionality d ∈ {16, 20, 24} and rank proxy r ∈ {2, 3}. Then, if the proposed main theorem for tensor compressed sensing (i.e., Theorem 2) is correct, the following two phenomena should be observed:

(1) Phenomenon 1: In the noiseless setting, i.e., σ = 0, if the observation number N is larger than C0rd2 for a sufficiently large constant C0, then the estimation error L̂LF2 can be zero, which means exact recovery. Let N0 = rd2 as a unit measure of the sample complexity. Then, by increasing the observation number N gradually from 0, we will observe a phase transition point of the estimation error in the noiseless setting: If N/N0 > C0, the estimation error is relatively “large”; once N/N0C0, the error will drop dramatically to 0.

(2) Phenomenon 2: In the noisy case, the estimation error L̂LF2 scales linearly with the variance σ2 of the random noises once the observation number NC0N0.

To check whether Phenomenon 1 occurs, we conduct tensor compressed sensing by setting the noise variance σ2 = 0. We gradually increase the normalized observation number N/N0 from 0.25 to 5. For each different setting of d, r, and N/N0, we repeat the experiments 10 times and report the averaged estimation error L̂LF2. For both TNN [37] and the proposed T2NN, we plot the curves of estimation error in logarithm versus the normalized observation number N/N0 for LRd×d×d with rank proxy r = 2 in Figure 1. It can be seen that Phenomenon 1 occurs for the proposed T2NN: When N/N0 > 1.75, the estimation error is relatively “large”; once N/N0 ≤ 1.75, the error will drop dramatically to 0. The same phenomenon also occurs for TNN with a phase transition point near 3.5. Thus, the sample complexity for exact tensor compressed sensing of T2NN is lower than that of TNN, indicating the superiority of the proposed T2NN. Since similar phenomena have also been observed for tensors of other sizes and rank proxies, we simply omit them.

FIGURE 1
www.frontiersin.org

FIGURE 1. Estimation error in logarithm vs. the normalized observation number N/N0 for tensor compressed sensing of underlying tensors of size 16×16×16 and rank proxy r =2. The proposed T2NN is compared with TNN [37].

For the validation of Phenomenon 2, we consider the noisy settings with normalized sample complexity N/N0 = 3.5, which is nearly the phase transition point of TNN and much greater than that of T2NN. We gradually increase the noise level c = σ/σ0 from 0.025 to 0.25. For each different setting of d, r, and c, we repeat the experiments 10 times and report the averaged estimation error L̂LF2. For both TNN [37] and the proposed T2NN, we plot the curves of estimation error in logarithm versus the (squared) noise level σ2/σ02 for LRd×d×d with rank proxy r = 2 in Figure 2. It can be seen that Phenomenon 2 also occurs for the proposed T2NN: The estimation error scales approximately linearly with the (squared) noise level. The same phenomenon can also be observed for TNN with a higher estimation error than T2NN, indicating T2NN is more accurate than TNN. We omit the results for tensors of other sizes and rank proxies because the error curves are so similar to Figure 2.

FIGURE 2
www.frontiersin.org

FIGURE 2. Estimation error vs. the (squared) noise level σ2/σ02 for tensor compressed sensing of underlying tensors of size 16×16×16 and rank proxy r =2. The proposed T2NN is compared with TNN [37].

6.2 Noisy Tensor Completion

This subsection evaluates effectiveness of the proposed T2NN through performance comparison with matrix nuclear norms (NN) [30], SNN [22], and TNN [25] by carrying out noisy tensor completion on three different types of visual data including video data, hyperspectral images, and seismic data.

6.2.1 Experimental Settings

Given the tensor data LRd1×d2×d3, the goal is to recover it from its partial noisy observations. We consider uniform sampling with ratio p ∈ {0.05, 0.1, 0.15} for the tensors, that is, {95, 90, 85%} entries of a tensor are missing. The noise follows i. i.d. Gaussian N(0,σ2) where σ = 0.05σ0, where σ0=LF/d1d2d3 is the rescaled magnitude of tensor LRd1×d2×d3.

6.2.2 Performance evaluation

The effectiveness of algorithms is measured by the Peak Signal Noise Ratio (PSNR) and structural similarity (SSIM) [38]. Specifically, the PSNR of an estimator L̂ is defined as

PSNR10log10d1d2d3L2L̂LF2,

for the underlying tensor LRd1×d2×d3. The SSIM is computed via

SSIM(2μLμL̂+(0.01ω̄)2)(2σL,L̂+(0.03ω̄)2)(μL2+μL̂2+(0.01ω̄)2)(σL2+σL̂2+(0.03ω̄)2),

where μL,μL̂,σL,σL̂,σL,L̂, and ω̄ denote the local means, standard deviation, cross-covariance, and dynamic range of the magnitude of tensors L and L̂. Larger PSNR and SSIM values indicate the higher quality of the estimator L̂. In each setting, we test each tensor for 10 trials and report the averaged PSNR (in db) and SSIM values.

6.2.3 Parameter Setting

For NN [30], we set the parameter λ=λισp(d1d2)log(d1+d2). For SNN [22], we set the regularization parameter λ = λι and chose the weight α by α1: α2: α3 = 1 : 1: 1. For TNN [25], we set λ=λισpd3(d1d2)log(d1d3+d2d3). For the proposed T2NN, we set the regularization parameter λ=λισpd3(d1d2)log(d1d3+d2d3) and choose the weights γ = 0.5 and α with α1: α2: α3 = 1 : 1: 10. The factor λι is then tuned in {10–3, 10–2, … , 103} for each norm, and we chose the one with highest PSNRs in most cases in the parameter tuning phase.

6.2.4 Experiments on Video Data

We first conduct noisy video completion on four widely used YUV videos: Akiyo, Carphone, Grandma, and Mother-daughter. Owing to computational limitation, we simply use the first 30 frames of the Y components of all the videos and obtain four tensors of size 144 × 17 × 30. We first report the averaged PSNR and SSIM values obtained by four norms for quantitative comparison in Table 1 and then give visual examples in Figure 3 when 95% of the tensor entries are missing for qualitative evaluation. A demo of the source code is available at https://github.com/pingzaiwang/T2NN-demo.

TABLE 1
www.frontiersin.org

TABLE 1. PSNR and SSIM values obtained by four norms (NN [30], SNN [22], TNN [25], and our T2NN) for noisy tensor completion on the YUV videos.

FIGURE 3
www.frontiersin.org

FIGURE 3. Visual results obtained by four norms for noisy tensor completion with 95% missing entries on the YUV-video dataset. The first to fourth rows correspond to the video of Akiyo, Carphone, Grandman, and Mother-duaghter, respectively. The sub-plots from (A) to (F): (A) a frame of the original video, (B) the observed frame, (C) the frame recovered by NN [30], (D) the frame recovered by SNN [22], (E) the frame recovered by the vanilla TNN [25], and (F) the frame recovered by our T2NN.

6.2.5 Experiments on Hyperspectral Data

We then carry out noisy tensor completion on subsets of the two representative hyperspectral datasets described as follows:

Indian Pines: The dataset was collected by AVIRIS sensor in 1992 over the Indian Pines test site in North-western Indiana and consists of 145 × 145 pixels and 224 spectral reflectance bands. We use the first 30 bands in the experiments due to the trade-off between the limitation of computing resources.

Salinas A: The data were acquired by AVIRIS sensor over the Salinas Valley, California in 1998, and consists of 224 bands over a spectrum range of 400–2500 nm. This dataset has a spatial extent of 86 × 83 pixels with a resolution of 3.7 m. We use the first 30 bands in the experiments too.

The averaged PSNR and SSIM values are given in Table 2 for quantitative comparison. We also show visual examples in Figure 4 when 85% of the tensor entries are missing for qualitative evaluation.

TABLE 2
www.frontiersin.org

TABLE 2. PSNR and SSIM values obtained by four norms (NN [30], SNN [22], TNN [25], and our T2NN) for noisy tensor completion on the hyperspectral datasets.

FIGURE 4
www.frontiersin.org

FIGURE 4. Visual results obtained by four norms for noisy tensor completion with 85% missing entries on the hyperspectral dataset (gray data shown with pseudo-color). The first and second rows correspond to Indian Pines and Salinas A, respectively. The sub-plots from (A) to (F): (A) a frame of the original data, (B) the observed frame, (C) the frame recovered by NN [30], (D) the frame recovered by SNN [22], (E) the frame recovered by the vanilla TNN [25], and (F) the frame recovered by our T2NN.

6.2.6 Experiments on Seismic Data

We use the seismic data tensor of size 512 × 512 × 3, which is abstracted from the test data “seismic.mat” of a toolbox for seismic data processing from Center of Geopyhsics, Harbin Institute of Technology, China. For quantitative comparison, we present the PSNR and SSIM values for two sampling schemes in Table 3.

TABLE 3
www.frontiersin.org

TABLE 3. PSNR and SSIM values obtained by four norms (NN [30], SNN [22], TNN [25], and our T2NN) for noisy tensor completion on the Seismic dataset.

6.2.7 Summary and Analysis of Experimental Results

According to the experimental results on three types of real tensor data shown in Table 1, Table 2, Table 3, and Figure 3, the summary and analysis are presented as follows:

1) In all the cases, tensor norms (SNN, TNN, and T2NN) perform better than the matrix norm (NN). It can be explained that tensor norms can honestly preserve the multi-way structure of tensor data such that the rich inter-modal and intra-modal correlations of the data can be exploited to impute the missing values, whereas the matrix norm can only handle two-way structure and thus fails to model the multi-way structural correlations of the tensor data.

2) In most cases, TNN outperforms SNN, which is in consistence with the results reported in [14, 17, 25]. One explanation is that the used video, hyperspectral images, and seismic data all possess stronger low-rankness in the spectral domain (than in the original domain), which can be successfully captured by TNN.

3) In most cases, the proposed T2NN performs best among the four norms. We owe the promising performance to the capability of T2NN in simultaneously exploiting low-rankness in both spectral and original domains.

7 Conclusion and Discussions

7.1 Conclusion

Due to its definition solely in the spectral domain, the popular TNN may be incapable to exploit low-rankness in the original domain. To remedy this weaknesses, a hybrid tensor norm named the “Tubal + Tucker” Nuclear Norm (T2NN) was first defined as the weighted sum of TNN and SNN to model both spectral and original domain low-rankness. It was further used to formulate a penalized least squares estimator for tensor recovery from noisy linear observations. Upper bounds on the estimation error were established in both deterministic and non-asymptotic senses to analyze the statistical performance of the proposed estimator. An ADMM-based algorithm was also developed to efficiently compute the estimator. The effectiveness of the proposed model was demonstrated through experimental results on both synthetic and real datasets.

7.2 Limitations of the Proposed Model and Possible Solutions

Generally speaking, the proposed estimator has the following two drawbacks due to the adoption of T2NN:

Sample inefficiency: The analysis of [24, 28] indicates that for tensor recovery from a small number of observations, T2NN cannot provide essentially lower sample complexity than TNN.

Computational inefficiency: Compared to TNN, T2NN is more time-consuming since it involves computing both TNN and SNN.

We list several directions that this work can be extended to overcome the above drawbacks.

For sample inefficiency: First, inspired by the attempt of adopting the “best” norm (e.g., Eq. 8 in [28]), the following model can be considered:

minLmaxLtnnLtnn,maxk=1,2,3L(k)L(k)s.t.  yX(L)2ϵ(65)

for a certain noise level ϵ ≥ 0. Although Model (65) has a significantly higher accuracy and lower sample complexity according to the analysis in [28], it is impractical because it requires Ltnn and L(k) (k = 1, 2, 3), which are unknown in advance. Motivated by [39], a more practical model is given as follows:

minLk=13exp(αkL(k))+exp(βLtnn)s.t.  yX(L)2ϵ,

where β > 0 is a regularization parameter.

For computational inefficiency: To improve the efficiency of the proposed T2NN-based models, we can use more efficient solvers of Problem (15) by adopting the factorization strategy [40, 41] or sampling-based approaches [42].

7.3 Extensions to the Proposed Model

In this subsection, we discuss possible extensions of the proposed model to general K-order (K > 3) tensors, general spectral domains, robust tensor recovery, and multi-view learning, respectively.

Extensions to K-order (K > 3) tensors: Currently, the proposed T2NN is defined solely for 3-order tensors, and it cannot be directly applied to tensors of more than 3 orders like color videos. For general K-order tensors, it is suggested to replace the tubal nuclear norm in the definition of T2NN with orientation invariant tubal nuclear norm [5], which is defined to exploit multi-orientational spectral low-rankness for general higher-order tensors.

Extensions to general spectral and original domains: This paper considers the DFT-based tensor product for spectral low-rank modeling. Recently, the DFT based t-product has been generalized to the *L-product defined via any invertible linear transform [43], under which the tubal nuclear norm is also extended to *L-tubal nuclear norm [44] and *L-Spectral k-support norm [7]. It is natural to generalize the proposed T2NN by changing the tubal nuclear norm to *L-tubal nuclear norm or *L-Spectral k-support norm for further extensions. It is also interesting to consider other tensor decompositions for original domain low-rankness modeling such as CP, TT, and TR as future work.

Extensions to robust tensor recovery: In many real applications, the tensor signal may also be corrupted by gross sparse outliers. Motivated by [5], the proposed T2NN can also be used in resisting sparse outliers for robust tensor recovery as follows:

minL,S12yX(L+S)2+λLt2nn+μS1,

where SRd1×d2×d3 denotes the tensor of sparse outliers, the tensor l1-norm ‖⋅‖1 is applied to encourage sparsity in S, and μ > 0 is a regularization parameter.

Extensions to multi-view learning: Due to its superiority in modeling multi-linear correlations of multi-modal data, TNN has been successfully applied to multi-view self-representations for clustering [45, 46]. Our proposed T2NN can also be utilized for clustering by straightforwardly replacing TNN in the formulation of multi-view learning models (e.g., Eq. 9 in [45]).

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here; https://sites.google.com/site/subudhibadri/fewhelpfuldownloads, https://engineering.purdue.edu/∼biehl/MultiSpec/hyperspectral.html, https://rslab.ut.ac.ir/documents/81960329/82035173/SalinasA_corrected.mat, https://github.com/sevenysw/MathGeo2018.

Author Contributions

Conceptualization and methodology—YL and AW; software—AW; formal analysis—YL, AW, GZ, and QZ; resources—YL, GZ, and QZ; writing: original draft preparation—YL, AW, GZ, and QZ; project administration and supervision—GZ, and QZ; and funding acquisition—AW, GZ, and QZ. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grants 61872188, 62073087, 62071132, 62103110, 61903095, U191140003, and 61973090, in part by the China Postdoctoral Science Foundation under Grant 2020M672536, and in part by the Natural Science Foundation of Guangdong Province under Grants 2020A1515010671, 2019B010154002, and 2019B010118001.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

AW is grateful to Prof. Zhong Jin in Nanjing University of Science and Technology for his long-time and generous support in both research and life. In addition, he would like to thank the Jin family in Zhuzhou for their kind understanding in finishing the project of tensor learning in these years.

Footnotes

1The Fourier version T̃ is obtained by performing 1D-DFT on all tubes of T, i.e., T̃=fft(T,[],3)Cd1×d2×d3 in MATLAB.

References

1. Guo C, Modi K, Poletti D. Tensor-Network-Based Machine Learning of Non-Markovian Quantum Processes. Phys Rev A (2020) 102:062414.

CrossRef Full Text | Google Scholar

2. Ma X, Zhang P, Zhang S, Duan N, Hou Y, Zhou M, et al. A Tensorized Transformer for Language Modeling. Adv Neural Inf Process Syst (2019) 32.

Google Scholar

3. Meng Y-M, Zhang J, Zhang P, Gao C, Ran S-J. Residual Matrix Product State for Machine Learning. arXiv preprint arXiv:2012.11841 (2020).

Google Scholar

4. Ran S-J, Sun Z-Z, Fei S-M, Su G, Lewenstein M. Tensor Network Compressed Sensing with Unsupervised Machine Learning. Phys Rev Res (2020) 2:033293. doi:10.1103/physrevresearch.2.033293

CrossRef Full Text | Google Scholar

5. Wang A, Zhao Q, Jin Z, Li C, Zhou G. Robust Tensor Decomposition via Orientation Invariant Tubal Nuclear Norms. Sci China Technol Sci (2022) 34:6102. doi:10.1007/s11431-021-1976-2

CrossRef Full Text | Google Scholar

6. Zhang X, Ng MK-P. Low Rank Tensor Completion with Poisson Observations. IEEE Trans Pattern Anal Machine Intelligence (2021). doi:10.1109/tpami.2021.3059299

CrossRef Full Text | Google Scholar

7. Wang A, Zhou G, Jin Z, Zhao Q. Tensor Recovery via *L-Spectral k-Support Norm. IEEE J Sel Top Signal Process (2021) 15:522–34. doi:10.1109/jstsp.2021.3058763

CrossRef Full Text | Google Scholar

8. Cui C, Zhang Z. High-Dimensional Uncertainty Quantification of Electronic and Photonic Ic with Non-Gaussian Correlated Process Variations. IEEE Trans Computer-Aided Des Integrated Circuits Syst (2019) 39:1649–61. doi:10.1109/TCAD.2019.2925340

CrossRef Full Text | Google Scholar

9. Liu X-Y, Aeron S, Aggarwal V, Wang X. Low-Tubal-Rank Tensor Completion Using Alternating Minimization. IEEE Trans Inform Theor (2020) 66:1714–37. doi:10.1109/tit.2019.2959980

CrossRef Full Text | Google Scholar

10. Carroll JD, Chang J-J. Analysis of Individual Differences in Multidimensional Scaling via an N-Way Generalization of “Eckart-Young” Decomposition. Psychometrika (1970) 35:283–319. doi:10.1007/bf02310791

CrossRef Full Text | Google Scholar

11. Tucker LR. Some Mathematical Notes on Three-Mode Factor Analysis. Psychometrika (1966) 31:279–311. doi:10.1007/bf02289464

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Oseledets IV. Tensor-Train Decomposition. SIAM J Sci Comput (2011) 33:2295–317. doi:10.1137/090752286

CrossRef Full Text | Google Scholar

13. Zhao Q, Zhou G, Xie S, Zhang L, Cichocki A. Tensor Ring Decomposition. arXiv preprint arXiv:1606.05535 (2016).

Google Scholar

14. Zhang Z, Ely G, Aeron S, Hao N, Kilmer M. Novel Methods for Multilinear Data Completion and De-Noising Based on Tensor-Svd. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014). p. 3842–9. doi:10.1109/cvpr.2014.485

CrossRef Full Text | Google Scholar

15. Kilmer ME, Braman K, Hao N, Hoover RC. Third-Order Tensors as Operators on Matrices: A Theoretical and Computational Framework with Applications in Imaging. SIAM J Matrix Anal Appl (2013) 34:148–72. doi:10.1137/110837711

CrossRef Full Text | Google Scholar

16. Hou J, Zhang F, Qiu H, Wang J, Wang Y, Meng D. Robust Low-Tubal-Rank Tensor Recovery from Binary Measurements. IEEE Trans Pattern Anal Machine Intelligence (2021). doi:10.1109/tpami.2021.3063527

CrossRef Full Text | Google Scholar

17. Lu C, Feng J, Chen Y, Liu W, Lin Z, Yan S. Tensor Robust Principal Component Analysis with a New Tensor Nuclear Norm. IEEE Trans Pattern Anal Mach Intell (2020) 42:925–38. doi:10.1109/tpami.2019.2891760

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Kolda TG, Bader BW. Tensor Decompositions and Applications. SIAM Rev (2009) 51:455–500. doi:10.1137/07070111x

CrossRef Full Text | Google Scholar

19. Li X, Wang A, Lu J, Tang Z. Statistical Performance of Convex Low-Rank and Sparse Tensor Recovery. Pattern Recognition (2019) 93:193–203. doi:10.1016/j.patcog.2019.03.014

CrossRef Full Text | Google Scholar

20. Liu J, Musialski P, Wonka P, Ye J. Tensor Completion for Estimating Missing Values in Visual Data. IEEE Trans Pattern Anal Mach Intell (2013) 35:208–20. doi:10.1109/tpami.2012.39

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Qiu Y, Zhou G, Chen X, Zhang D, Zhao X, Zhao Q. Semi-Supervised Non-Negative Tucker Decomposition for Tensor Data Representation. Sci China Technol Sci (2021) 64:1881–92. doi:10.1007/s11431-020-1824-4

CrossRef Full Text | Google Scholar

22. Tomioka R, Suzuki T, Hayashi K, Kashima H. Statistical Performance of Convex Tensor Decomposition. In: Proceedings of Annual Conference on Neural Information Processing Systems (2011). p. 972–80.

Google Scholar

23. Boyd S, Boyd SP, Vandenberghe L. Convex Optimization. Cambridge: Cambridge University Press (2004).

Google Scholar

24. Mu C, Huang B, Wright J, Goldfarb D. Square Deal: Lower Bounds and Improved Relaxations for Tensor Recovery. In: International Conference on Machine Learning (2014). p. 73–81.

Google Scholar

25. Wang A, Lai Z, Jin Z. Noisy Low-Tubal-Rank Tensor Completion. Neurocomputing (2019) 330:267–79. doi:10.1016/j.neucom.2018.11.012

CrossRef Full Text | Google Scholar

26. Zhou P, Lu C, Lin Z, Zhang C. Tensor Factorization for Low-Rank Tensor Completion. IEEE Trans Image Process (2018) 27:1152–63. doi:10.1109/tip.2017.2762595

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Negahban S, Wainwright MJ. Estimation of (Near) Low-Rank Matrices with Noise and High-Dimensional Scaling. Ann Stat (2011) 2011:1069–97. doi:10.1214/10-aos850

CrossRef Full Text | Google Scholar

28. Oymak S, Jalali A, Fazel M, Eldar YC, Hassibi B. Simultaneously Structured Models with Application to Sparse and Low-Rank Matrices. IEEE Trans Inform Theor (2015) 61:2886–908. doi:10.1109/tit.2015.2401574

CrossRef Full Text | Google Scholar

29. Foucart S, Rauhut H. A Mathematical Introduction to Compressive Sensing, Vol. 1. Basel, Switzerland: Birkhäuser Basel (2013).

Google Scholar

30. Klopp O. Noisy Low-Rank Matrix Completion with General Sampling Distribution. Bernoulli (2014) 20:282–303. doi:10.3150/12-bej486

CrossRef Full Text | Google Scholar

31. Klopp O. Matrix Completion by Singular Value Thresholding: Sharp Bounds. Electron J Stat (2015) 9:2348–69. doi:10.1214/15-ejs1076

CrossRef Full Text | Google Scholar

32. Vershynin R. High-Dimensional Probability: An Introduction with Applications in Data Science, Vol. 47. Cambridge: Cambridge University Press (2018).

Google Scholar

33. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations Trends® Machine Learn (2011) 3:1–122. doi:10.1561/2200000016

CrossRef Full Text | Google Scholar

34. Wang A, Wei D, Wang B, Jin Z. Noisy Low-Tubal-Rank Tensor Completion Through Iterative Singular Tube Thresholding. IEEE Access (2018) 6:35112–28. doi:10.1109/access.2018.2850324

CrossRef Full Text | Google Scholar

35. Cai J-F, Candès EJ, Shen Z. A Singular Value Thresholding Algorithm for Matrix Completion. SIAM J Optim (2010) 20:1956–82. doi:10.1137/080738970

CrossRef Full Text | Google Scholar

36. He B, Yuan X. On the $O(1/n)$ Convergence Rate of the Douglas-Rachford Alternating Direction Method. SIAM J Numer Anal (2012) 50:700–9. doi:10.1137/110836936

CrossRef Full Text | Google Scholar

37. Lu C, Feng J, Lin Z, Yan S. Exact Low Tubal Rank Tensor Recovery from Gaussian Measurements. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (2018). p. 1948–54. doi:10.24963/ijcai.2018/347

CrossRef Full Text | Google Scholar

38. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image Quality Assessment: from Error Visibility to Structural Similarity. IEEE Trans Image Process (2004) 13:600–12. doi:10.1109/tip.2003.819861

PubMed Abstract | CrossRef Full Text | Google Scholar

39. Zhang X, Zhou Z, Wang D, Ma Y. Hybrid Singular Value Thresholding for Tensor Completion. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014). p. 1362–8.

CrossRef Full Text | Google Scholar

40. Wang A-D, Jin Z, Yang J-Y. A Faster Tensor Robust Pca via Tensor Factorization. Int J Mach Learn Cyber (2020) 11:2771–91. doi:10.1007/s13042-020-01150-2

CrossRef Full Text | Google Scholar

41. Liu G, Yan S. Active Subspace: Toward Scalable Low-Rank Learning. Neural Comput (2012) 24:3371–94. doi:10.1162/neco_a_00369

PubMed Abstract | CrossRef Full Text | Google Scholar

42. Wang L, Xie K, Semong T, Zhou H. Missing Data Recovery Based on Tensor-Cur Decomposition. IEEE Access (2017) PP:1.

Google Scholar

43. Kernfeld E, Kilmer M, Aeron S. Tensor-Tensor Products with Invertible Linear Transforms. Linear Algebra its Appl (2015) 485:545–70. doi:10.1016/j.laa.2015.07.021

CrossRef Full Text | Google Scholar

44. Lu C, Peng X, Wei Y. Low-Rank Tensor Completion with a New Tensor Nuclear Norm Induced by Invertible Linear Transforms. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019). p. 5996–6004. doi:10.1109/cvpr.2019.00615

CrossRef Full Text | Google Scholar

45. Lu G-F, Zhao J. Latent Multi-View Self-Representations for Clustering via the Tensor Nuclear Norm. Appl Intelligence (2021) 2021:1–13. doi:10.1007/s10489-021-02710-x

CrossRef Full Text | Google Scholar

46. Liu Y, Zhang X, Tang G, Wang D. Multi-View Subspace Clustering Based on Tensor Schatten-P Norm. In: 2019 IEEE International Conference on Big Data (Big Data). Los Angeles, CA, USA: IEEE (2019). p. 5048–55. doi:10.1109/bigdata47090.2019.9006347

CrossRef Full Text | Google Scholar

Keywords: tensor decomposition, tensor low-rankness, tensor SVD, tubal nuclear norm, tensor completion

Citation: Luo Y, Wang A, Zhou G and Zhao Q (2022) A Hybrid Norm for Guaranteed Tensor Recovery. Front. Phys. 10:885402. doi: 10.3389/fphy.2022.885402

Received: 28 February 2022; Accepted: 27 April 2022;
Published: 13 July 2022.

Edited by:

Peng Zhang, Tianjin University, China

Reviewed by:

Jingyao Hou, Southwest University, China
Yong Peng, Hangzhou Dianzi University, China
Jing Lou, Changzhou Institute of Mechatronic Technology, China
Guifu Lu, Anhui Polytechnic University, China

Copyright © 2022 Luo, Wang, Zhou and Zhao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Andong Wang, w.a.d@outlook.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.