A Hybrid Norm for Guaranteed Tensor Recovery

Luo, Yihao; Wang, Andong; Zhou, Guoxu; Zhao, Qibin

doi:10.3389/fphy.2022.885402

ORIGINAL RESEARCH article

Front. Phys., 13 July 2022

Sec. Statistical and Computational Physics

Volume 10 - 2022 | https://doi.org/10.3389/fphy.2022.885402

This article is part of the Research TopicTensor Network Approaches for Quantum Many-body Physics and Machine LearningView all 5 articles

A Hybrid Norm for Guaranteed Tensor Recovery

Yihao Luo¹

Andong Wang^1,2*

Guoxu Zhou^1,3

Qibin Zhao²

¹School of Automation, Guangdong University of Technology, Guangzhou, China
²RIKEN AIP, Tokyo, Japan
³Key Laboratory of Intelligent Detection and the Internet of Things in Manufacturing, Ministry of Education, Guangzhou, China

Benefiting from the superiority of tensor Singular Value Decomposition (t-SVD) in excavating low-rankness in the spectral domain over other tensor decompositions (like Tucker decomposition), t-SVD-based tensor learning has shown promising performance and become an emerging research topic in computer vision and machine learning very recently. However, focusing on modeling spectral low-rankness, the t-SVD-based models may be insufficient to exploit low-rankness in the original domain, leading to limited performance while learning from tensor data (like videos) that are low-rank in both original and spectral domains. To this point, we define a hybrid tensor norm dubbed the “Tubal + Tucker” Nuclear Norm (T2NN) as the sum of two tensor norms, respectively, induced by t-SVD and Tucker decomposition to simultaneously impose low-rankness in both spectral and original domains. We further utilize the new norm for tensor recovery from linear observations by formulating a penalized least squares estimator. The statistical performance of the proposed estimator is then analyzed by establishing upper bounds on the estimation error in both deterministic and non-asymptotic manners. We also develop an efficient algorithm within the framework of Alternating Direction Method of Multipliers (ADMM). Experimental results on both synthetic and real datasets show the effectiveness of the proposed model.

1 Introduction

Thanks to the rapid progress of computer technology, data in tensor format (i.e., multi-dimensional array) are emerging in computer vision, machine learning, remote sensing, quantum physics, and many other fields, triggering an increasing need for tensor-based learning theory and algorithms [1–6]. In this paper, we carry out both theoretic and algorithmic research studies on tensor recovery from linear observations, which is a typical problem in tensor learning aiming to learn an unknown tensor when only a limited number of its noisy linear observations are available [7]. Tensor recovery finds applications in many industrial circumstances where the sensed or collected tensor data are polluted by unpredictable factors such as sensor failures, communication losses, occlusion by objects, shortage of instruments, and electromagnetic interferences [7–9], and is thus of both theoretical and empirical significance.

In general, reconstructing an unknown tensor from only a small number of its linear observations is hopeless, unless some assumptions on the underlying tensor are made [9]. The most commonly used assumption is that the underlying tensor possesses some kind of low-rankness which can significantly limit its degree of freedom, such that the signal can be estimated from a small but sufficient number of observations [7]. However, as a higher-order extension of matrix low-rankness, the tensor low-rankness has many different characterizations due to the multiple definitions of tensor rank, e.g., the CANDECOMP/PARAFAC (CP) rank [10], Tucker rank [11], Tensor Train (TT) rank [12], and Tensor Ring (TR) rank [13]. As has been discussed in [7] from a signal processing standpoint, the above exampled rank functions are defined in the original domain of the tensor signal and may thus be insufficient to model low-rankness in the spectral domain. The recently proposed tensor low-tubal-rankness [14] within the algebraic framework of tensor Singular Value Decomposition (t-SVD) [15] gives a kind of complement to it by exploiting low-rankness in the spectral domain defined via Discrete Fourier Transform (DFT), and has witnessed significant performance improvements in comparison with the original domain-based low-rankness for tensor recovery [6, 16, 17].

Despite the popularity of low-tubal-rankness, the fact that it is defined solely in the spectral domain also naturally poses a potential limitation on its usability to some tensor data that are low-rank in both spectral and original domains. To address this issue, we propose a hybrid tensor norm to encourage low-rankness in both spectral and original domains at the same time for tensor recovery in this paper. Specifically, the contributions of this work are four-fold:

• To simultaneously exploit low-rankness in both spectral and original domains, we define a new norm named T2NN as the sum of two tensor nuclear norms induced, respectively, by the t-SVD for spectral low-rankness and Tucker decomposition for original domain low-rankness.

• Then, we apply the proposed norm to tensor recovery by formulating a new tensor least squares estimator penalized by T2NN.

• Statistically, the statistical performance of the proposed estimator is analyzed by establishing upper bounds on the estimation error in both deterministic and non-asymptotic manners.

• Algorithmically, we propose an algorithm based on ADMM to compute the estimator and evaluate its effectiveness on three different types of real data.

The rest of this paper proceeds as follows. First, the notations and preliminaries of low-tubal-rankness and low-Tucker-rankness are introduced in Section 2. Then, we define the new norm and apply it to tensor recovery in Section 3. To understand the statistical behavior of the estimator, we establish an upper bound on the estimation error in Section 4. To compute the proposed estimator, we design an ADMM-based algorithm in Section 5 with empirical performance reported in Section 6.

2 Notations and Preliminaries

Notations. We use lowercase boldface, uppercase boldface, and calligraphy letters to denote vectors (e.g., v), matrices (e.g., M), and tensors (e.g., $T$ ), respectively. For any real numbers a, b, let a ∨ b = max{a, b} and a ∧ b = min{a, b}. If the size of a tensor is not given explicitly, then it is in $R^{d_{1} \times d_{2} \times d_{3}}$ . We use c, c′, c₁, etc., to denote constants whose values can vary from line to line. For notational simplicity, let $\tilde{d} = (d_{1} + d_{2}) d_{3}$ and d_\k = d₁d₂d₃/d_k for k = 1, 2, 3.

Given a matrix $M \in C^{d_{1} \times d_{2}}$ , its nuclear norm and spectral norm are defined as ‖M‖_∗≔∑_iσ_i and $‖ M ‖ ≔ \max_{i} σ_{i}$ , respectively, where {σ_i |i = 1, 2, … , d₁ ∧ d₂} are its singular values. Given a tensor $T \in R^{d_{1} \times d_{2} \times d_{3}}$ , define its l₁-norm and F-norm as $‖ T ‖_{1} ≔ ‖ vec (T) ‖_{1}, ‖ T ‖_{F} ≔ ‖ vec (T) ‖_{2}$ , respectively, where vec (⋅) denotes the vectorization operation of a tensor [18]. Given $T \in R^{d_{1} \times d_{2} \times d_{3}}$ , let $T^{(i)} ≔ T (:, :, i)$ denote its ith frontal slice. For any two (real or complex) tensors $A, B$ of the same size, define their inner product as the inner product of their vectorizations $⟨ A, B ⟩ ≔ ⟨ vec (A), vec (B) ⟩$ . Other notations are introduced at their first appearance.

2.1 Spectral Rankness Modeled by t-SVD

The low-tubal-rankness defined within the algebraic framework of t-SVD is a typical example to characterize low-rankness in the spectral domain. We give some basic notions about t-SVD in this section.

Definition 1 (t-product [15]). Given $T_{1} \in R^{d_{1} \times d_{2} \times d_{3}}$ and $T_{2} \in R^{d_{2} \times d_{4} \times d_{3}}$ , their t-product $T = T_{1} * T_{2} \in R^{d_{1} \times d_{4} \times d_{3}}$ is a tensor whose (i, j)-th tube $T (i, j, :) = \sum_{k = 1}^{d_{2}} T_{1} (i, k, :) • T_{2} (k, j, :)$ , where • is the circular convolution [15].

Definition 2 (tensor transpose [15]). Let $T$ be a tensor of size d₁ × d₂ × d₃, then $T^{⊤}$ is the d₂ × d₁ × d₃ tensor obtained by transposing each of the frontal slices and then reversing the order of transposed frontal slices 2 through d₃.

Definition 3 (identity tensor [15]). The identity tensor $I \in R^{d \times d \times d_{3}}$ is a tensor whose first frontal slice is the d × d identity matrix and all other frontal slices are zero.

Definition 4 (f-diagonal tensor [15]). A tensor is called f-diagonal if each frontal slice of the tensor is a diagonal matrix.

Definition 5 (Orthogonal tensor [15]). A tensor $Q \in R^{d \times d \times d_{3}}$ is orthogonal if $Q^{⊤} * Q = Q * Q^{⊤} = I$ .Then, t-SVD can be defined as follows.

Definition 6 (t-SVD, tubal rank [15]). Any tensor $T \in R^{d_{1} \times d_{2} \times d_{3}}$ has a tensor singular value decomposition as

\begin{aligned} T = U * S * V^{⊤}, \end{aligned} (1)

where $U \in R^{d_{1} \times d_{1} \times d_{3}}$ , $V \in R^{d_{2} \times d_{2} \times d_{3}}$ are orthogonal tensors and $S \in R^{d_{1} \times d_{2} \times d_{3}}$ is an f-diagonal tensor. The tubal rank of $T$ is defined as the number of non-zero tubes of $T$ ,

\begin{aligned} {r a n k}_{t b} (T) ≔ # \{i| S (i, i, :) \neq 0\}, \end{aligned} (2)

where # counts the number of elements in a set.For convenience of analysis, the block diagonal matrix of 3-way tensors is also defined.

Definition 7 (block-diagonal matrix [15]). Let $\bar{T}$ denote the block-diagonal matrix of the tensor $\tilde{T}$ in the Fourier domain¹, i.e.,

\begin{aligned} \bar{T} ≔ [\begin{matrix} {\tilde{T}}^{(1)} \\ ⋱ \\ {\tilde{T}}^{(d_{3})} \end{matrix}] \in C^{d_{1} d_{3} \times d_{2} d_{3}} . \end{aligned} (3)

Definition 8 (tubal nuclear norm, tensor spectral norm [17]). Given $T \in R^{d_{1} \times d_{2} \times d_{3}}$ , let $\tilde{T}$ be its Fourier version in $C^{d_{1} \times d_{2} \times d_{3}}$ . The Tubal Nuclear Norm (TNN) ‖⋅‖_tnn of $T$ is defined as the averaged nuclear norm of frontal slices of $\tilde{T}$ ,

\begin{aligned} ‖ T ‖_{t n n} ≔ \frac{1}{d_{3}} \sum_{i = 1}^{d_{3}} ‖ {\tilde{T}}^{(i)} ‖_{*}, \end{aligned}

whereas the tensor spectral norm ‖⋅‖ is the largest spectral norm of the frontal slices,

\begin{aligned} ‖ T ‖ ≔ \max_{i \in [d_{3}]} {‖ {\tilde{T}}^{(i)} ‖} . \end{aligned}

We can see from Definition 8 that TNN captures low-rankness in the spectral domain and is thus more suitable for tensors with spectral low-rankness. As visual data (like images and videos) often process strong spectral low-rankness, it has achieved superior performance over many original domain-based nuclear norms in visual data restoration [6, 17].

2.2 Original Domain Low-Rankness Modeled by Tucker Decomposition

The low-Tucker-rankness is a classical higher-order extension of matrix low-rankness in the original domain and has been widely applied in computer vision and machine learning [19–21]. Given any K-way tensor $T \in R^{d_{1} \times d_{2} \times d_{3}}$ , its Tucker rank is defined as the following vector:

\begin{aligned} {\vec{r}}_{Tucker} (T) ≔ {(r a n k (T_{(1)}), \dots, r a n k (T_{(K)}))}^{⊤} \in R^{K}, \end{aligned} (4)

where $T_{(k)} \in R^{d_{k} \times \prod_{i \neq k} d_{i}}$ denotes the mode-k unfolding (matrix) of $T$ [18] obtained by concatenating all the mode-k fibers of $T$ as column vectors. We can see that the Tucker rank measures the low-rankness of all the mode-k unfoldings T_(k) in the original domain.

Through relaxing the matrix rank in Eq. 4 to its convex envelope, i.e., the matrix nuclear norm, we get a convex relaxation of the Tucker rank, called Sum of Nuclear Norms (SNN) [20], which is defined as follows:

\begin{aligned} ‖ T ‖_{snn} ≔ \sum_{k = 1}^{K} α_{k} ‖ T_{(k)} ‖_{*}, \end{aligned} (5)

where α_k’s are positive constants satisfying ∑_ka_k = 1. As a typical tensor low-rankness penalty in the original domain, SNN has found many applications in tensor recovery [19, 20, 22].

3 A Hybrid Norm for Tensor Recovery

In this section, we first define a new norm to exploit low-rankness in both spectral and original domains and then use it to formulate a penalized tensor least squares estimator.

3.1 The Proposed Norm

Although TNN has shown superior performance in many tensor learning tasks, it may still be insufficient for tensors which are low-rank in both spectral and original domains due to its definition solely in the spectral domain. Moreover, it is also unsuitable for tensors which have less significant spectral low-rankness than the original domain low-rankness. Thus, it is necessary to extend the vanilla TNN such that the original domain low-rankness can also be exploited for sounder low-rank modeling.

Under the inspiration of SNN’s impressive low-rank modeling capability in the original domain, our idea is quite simple: to combine the advantages of both TNN and SNN through their weighted sum. In this line of thinking, we come up with the following hybrid tensor norm.

Definition 9 (T2NN). The hybrid norm called “Tubal + Tucker” Nuclear Norm (T2NN) of any 3-way tensor $T \in R^{d_{1} \times d_{2} \times d_{3}}$ is defined as the weighted sum of its TNN and SNN as follows:

\begin{aligned} ‖ T ‖_{t 2 nn} ≔ γ ‖ T ‖_{t n n} + (1 - γ) ‖ T ‖_{snn}, \end{aligned} (6)

where γ ∈ (0, 1) is a constant balancing the low-rank modeling in the spectral and original domains.As can be seen from its definition, T2NN approximates TNN when γ → 1, and it degenerates to SNN as γ → 0. Thus, it can be viewed as an interpolation between TNN and SNN, which provides with more flexibility in low-rank tensor modeling. We also define the dual norm of T2NN (named the dual T2NN norm) which are frequently used in analyzing the statistical performance of the T2NN-based tensor estimator.

Lemma 1. The dual norm of the proposed T2NN defined as

\begin{aligned} ‖ T ‖_{t 2 nn}^{*} ≔ \sup_{T} ⟨ X, T ⟩, s.t. ‖ X ‖_{t 2 nn} \leq 1, \end{aligned} (7)

can be equivalently formulated as follows:

\begin{aligned} ‖ T ‖_{t 2 nn}^{*} & = \inf_{A, B, C, D} \max \{\frac{1}{γ} ‖ A ‖, \frac{1}{α_{1} (1 - γ)} ‖ B_{(1)} ‖, \\ \frac{1}{α_{2} (1 - γ)} ‖ C_{(2)} ‖, \frac{1}{α_{3} (1 - γ)} ‖ D_{(3)} ‖\}, \\ s.t. A + B + C + D = T . \end{aligned} (8)

Proof of Lemma 1. Using the definition of T2NN, the supremum in Problem (7) can be equivalently converted to the opposite number of infimum as follows:

\begin{aligned} - ‖ T ‖_{t 2 nn}^{*} = & \inf_{T} - ⟨ X, T ⟩, \\ s.t. γ ‖ X ‖_{t n n} + α_{1} (1 - γ) ‖ X_{(1)} ‖_{*} + α_{2} (1 - γ) \\ \times ‖ X_{(2)} ‖_{*} + α_{3} (1 - γ) ‖ X_{(3)} ‖_{*} \leq 1 . \end{aligned} (9)

By introducing a multiplier λ ≥ 0, we obtain the Lagrangian function of Problem (9),

\begin{aligned} L (X, λ) \\ ≔ - ⟨ X, T ⟩ + λ (γ ‖ X ‖_{t n n} + α_{1} (1 - γ) ‖ X_{(1)} ‖_{*} \\ + α_{2} (1 - γ) ‖ X_{(2)} ‖_{*} + α_{3} (1 - γ) ‖ X_{(3)} ‖_{*} - 1) . \end{aligned}

Since Slatter’s condition [23] is satisfied in Problem (9), strong duality holds, which means

\begin{aligned} - ‖ T ‖_{t 2 nn} = \inf_{X} \sup_{λ} L (X, λ) = \sup_{λ} \inf_{X} L (X, λ) . \end{aligned}

Thus, we proceed by computing $\sup_{λ} \inf_{X} L (X, λ)$ as follows:

\begin{aligned} \sup_{λ} \inf_{X} - ⟨ X, T ⟩ + λ (γ ‖ X ‖_{t n n} + α_{1} (1 - γ) ‖ X_{(1)} ‖_{*} \\ + α_{2} (1 - γ) ‖ X_{(2)} ‖_{*} + α_{3} (1 - γ) ‖ X_{(3)} ‖_{*} - 1) \\ \overset{(i)}{=} \sup_{λ} \inf_{X} - ⟨ X, A + B + C + D ⟩ \\ + λ (γ ‖ X ‖_{t n n} + α_{1} (1 - γ) ‖ X_{(1)} ‖_{*} + α_{2} (1 - γ) ‖ X_{(2)} ‖_{*} \\ + α_{3} (1 - γ) ‖ X_{(3)} ‖_{*} - 1) - λ, where \\ A + B + C + D = T, \\ = \sup_{λ} \inf_{X} - λ + (λ γ ‖ X ‖_{t n n} - ⟨ X, A ⟩) + (λ α_{1} (1 - γ) \\ \times ‖ X_{(1)} ‖_{*} - ⟨ X, B ⟩) + (λ α_{2} (1 - γ) ‖ X_{(2)} ‖_{*} - ⟨ X, C ⟩) \\ + (λ α_{3} (1 - γ) ‖ X_{(3)} ‖_{*} - ⟨ X, D ⟩), where \\ A + B + C + D = T, \\ \overset{(i i)}{=} \sup_{λ} - λ + \{\begin{matrix} 0 & if λ \geq \frac{1}{γ} ‖ A ‖ \\ - \infty & otherwise \end{matrix} \\ + \{\begin{matrix} 0 & if λ \geq \frac{1}{α_{1} (1 - γ)} ‖ B_{(1)} ‖ \\ - \infty & otherwise \end{matrix} \\ + \{\begin{matrix} 0 & if λ \geq \frac{1}{α_{2} (1 - γ)} ‖ C_{(2)} ‖ \\ - \infty & otherwise \end{matrix} \\ + \{\begin{matrix} 0 & if λ \geq \frac{1}{α_{3} (1 - γ)} ‖ D_{(3)} ‖ \\ - \infty & otherwise \end{matrix} \\ where A + B + C + D = T, \\ = - \inf_{A + B + C + D = T} \max \{\frac{1}{γ} ‖ A ‖, \frac{1}{α_{1} (1 - γ)} \\ \times ‖ B_{(1)} ‖, \frac{1}{α_{2} (1 - γ)} ‖ C_{(2)} ‖, \frac{1}{α_{3} (1 - γ)} ‖ D_{(3)} ‖\}, \end{aligned}

where (i) is obtained by the trick of splitting $T$ into four auxiliary tensors $A, B, C, D$ for simpler analysis and (ii) holds because for any positive constant α, any norm f (⋅) with dual norm f*(⋅), we have the following relationship:

\begin{aligned} \inf_{X} λ α f (X) - ⟨ X, A ⟩ \geq \inf_{X} λ α f (X) - γ f (X) \cdot \frac{1}{α} f^{*} (A), \\ = \inf_{X} α f (X) (λ - \frac{1}{α} f^{*} (A)), \\ = \{\begin{matrix} 0, & if λ \geq \frac{1}{γ} f^{*} (A), \\ - \infty, & otherwise . \end{matrix} \end{aligned}

This completes the proof.Although an expression of the dual T2NN norm is given in Lemma 1, it is still an optimization problem whose optimal value cannot be straightforwardly computed from the variable tensor $T$ . Following the tricks in [22], we instead give an upper bound on the dual T2NN norm which is directly in terms of $T$ in the following lemma:

Lemma 2. The dual T2NN norm can be upper bounded as follows:

\begin{aligned} ‖ T ‖_{t 2 nn}^{*} & \leq \frac{1}{16} (\frac{1}{γ} ‖ T ‖ + \frac{1}{α_{1} (1 - γ)} ‖ T_{(1)} ‖ + \frac{1}{α_{2} (1 - γ)} ‖ T_{(2)} ‖ \\ + \frac{1}{α_{3} (1 - γ)} ‖ T_{(3)} ‖) . \end{aligned} (10)

Proof of Lemma 2. The proof is a direct application of the basic equality “harmonic mean ≤ arithmetic mean” with careful construction of auxiliary tensors $A, B, C, D$ in Eq. 8 as follows:

\begin{aligned} A_{0} & = \frac{γ ‖ T ‖^{- 1}}{M}, B_{0} = \frac{α_{1} (1 - γ) ‖ T_{(1)} ‖^{- 1}}{M}, \\ C_{0} & = \frac{α_{2} (1 - γ) ‖ T_{(2)} ‖^{- 1}}{M}, D_{0} = \frac{α_{3} (1 - γ) ‖ T_{(3)} ‖^{- 1}}{M}, \end{aligned}

where the denominator M is given by

\begin{aligned} M = γ ‖ T ‖^{- 1} + α_{1} (1 - γ) ‖ T_{(1)} ‖^{- 1} + α_{2} (1 - γ) ‖ T_{(2)} ‖^{- 1} + α_{3} (1 - γ) ‖ T_{(3)} ‖^{- 1} . \end{aligned}

It is obvious that $A_{0} + B_{0} + C_{0} + D_{0} = T$ . By substituting the particular setting $(A_{0}, B_{0}, C_{0}, D_{0})$ of $(A, B, C, D)$ into Eq. 8, we obtain

\begin{aligned} ‖ T ‖_{t 2 nn}^{*} \leq \frac{1}{\begin{matrix} γ ‖ T ‖^{- 1} + α_{1} (1 - γ) ‖ T_{(1)} ‖^{- 1} + α_{2} (1 - γ) \\ ‖ T_{(2)} ‖^{- 1} + α_{3} (1 - γ) ‖ T_{(3)} ‖^{- 1} \end{matrix}} . \end{aligned} (11)

Then, by using “harmonic mean ≤ arithmetic mean” on the right-hand side of Eq. 11, we obtain

\begin{aligned} \frac{4}{\begin{matrix} γ ‖ T ‖^{- 1} + α_{1} (1 - γ) ‖ T_{(1)} ‖^{- 1} + α_{2} (1 - γ) \\ ‖ T_{(2)} ‖^{- 1} + α_{3} (1 - γ) ‖ T_{(3)} ‖^{- 1} \end{matrix}} \\ \leq \frac{1}{4} (\frac{1}{γ} ‖ T ‖ + \frac{1}{α_{1} (1 - γ)} ‖ T_{(1)} ‖ + \frac{1}{α_{2} (1 - γ)} ‖ T_{(2)} ‖ \\ + \frac{1}{α_{3} (1 - γ)} ‖ T_{(3)} ‖), \end{aligned} (12)

which directly leads to Eq. 10.

3.2 T2NN-Based Tensor Recovery

3.2.1 The observation Model

We use $L^{*} \in R^{d_{1} \times d_{2} \times d_{3}}$ to denote the underlying tensor which is unknown. Suppose one observes N ≪ d₁d₂d₃ scalars,

\begin{aligned} y_{i} = ⟨ L^{*}, X_{i} ⟩ + σ ξ_{i}, \forall i \in [N], \end{aligned} (13)

where $X_{i}$ ’s are known (deterministic or random) design tensors, ξ_i’s are i. i.d. standard Gaussian noises, and σ is a known standard deviation constant measuring the noise level.

Let $y = {(y_{1}, \dots, y_{N})}^{⊤}$ and $ξ = {(ξ_{1}, \dots, ξ_{N})}^{⊤}$ denote the collection of observations and noises. Define the design operator $X (\cdot)$ with adjoint operator $X^{*} (\cdot)$ as follows:

\begin{aligned} \forall T \in R^{d_{1} \times d_{2} \times d_{3}}, X (T) \\ ≔ {(⟨ T, X_{1} ⟩, \dots, ⟨ T, X_{N} ⟩)}^{⊤} \in R^{N}, \\ \forall z \in R^{N}, X^{*} (z) & ≔ \sum_{i = 1}^{N} z_{i} X_{i} \in R^{d_{1} \times d_{2} \times d_{3}} . \end{aligned} (14)

Then, the observation model (13) can be rewritten in the following compact form:

\begin{aligned} y = X (L^{*}) + ξ . \end{aligned}

3.2.2 Two Typical Settings

With different settings of the design tensors ${X_{i}}$ , we consider two classical examples in this paper:

• Tensor completion. In tensor completion, the design tensors ${X_{i}}$ are i. i.d. random tensor bases drawn from uniform distribution on the canonical basis in the space of d₁ × d₂ × d₃ tensors $\{e_{i} ◦ e_{j} ◦ e_{k}, \forall (i, j, k) \in [d_{1}] \times [d_{2}] \times [d_{3}]\}$ , where e_i denotes the vector whose ith entry is 1 with all the other entries 0 and ◦ denotes the tensor outer product [18].

• Tensor compressive sensing. When $X$ is a random Gaussian design, Model (13) is the tensor compressive sensing model with Gaussian measurements [24]. $X$ is named a random Gaussian design when ${X_{i}}$ are random tensors with i. i.d. standard Gaussian entries [22].

3.2.3 The Proposed Estimator

The goal of this paper is to recover the unknown low-rank tensor $L^{*}$ from noisy linear observations y satisfying the observation model (13).

Inspired by the capability of the newly defined T2NN in simultaneously modeling low-rankness in both spectral and original domains, we define the T2NN penalized least squares estimator $\hat{L}$ to estimate the unknown truth $L^{*}$ ,

\begin{aligned} \hat{L} \in \underset{L}{argmin} \frac{1}{2} ‖ y - X (L) ‖_{2}^{2} + λ ‖ L ‖_{t 2 nn}, \end{aligned} (15)

where the squared l₂-norm is adopted as the fidelity term for Gaussian noises, the proposed T2NN is used to impose both spectral and original low-rankness in the solution, and λ is a penalization parameter which balances the residual fitting accuracy and the parameter complicity (characterized by low-rankness) of the model.

Given the estimator $\hat{L}$ in Eq. 15, one may naturally ask how well it can estimate the truth $L^{*}$ and how to compute it. In the following two sections, we first study the estimation performance of $\hat{L}$ by upper bounding its estimation error and then develop an ADMM-based algorithm to efficiently compute it.

4 Statistical Guarantee

In this section, we first come up with a deterministic upper bound of the estimation error and then establish non-asymptotic error bounds for the special cases of tensor compressive sensing with random Gaussian design and noisy tensor completion.

First, to describe the low-rankness of $L^{*}$ , we consider both its low-tubal-rank and low-Tucker-rank structures as follows:

• `Low-tubal-rank structure: Let $r_{tb}^{*}$ denote the tubal rank of $L^{*}$ . Suppose it has reduced t-SVD $L^{*} = U * S * V^{⊤}$ , where $U \in R^{d_{1} \times r_{tb}^{*} \times d_{3}}, V^{⊤} \in R^{d_{2} \times r_{tb}^{*} \times d_{3}}$ are orthogonal tensors and $S \in R^{r_{tb}^{*} \times r_{tb}^{*} \times d_{3}}$ is f-diagonal. Then, following [25], we define the following projections of any tensor $T \in R^{d_{1} \times d_{2} \times d_{3}}$ :

\begin{aligned} P^{⊥} (T) ≔ (I - U * U^{⊤}) * T * (I - V * V^{⊤}) and P (T) = T - P^{⊥} (T) \end{aligned} (16)

where $I$ denotes the identity tensor of appropriate dimensionality.

• Low-Tucker-rank structure: Let $r_{tk}^{*} = {(r_{1}^{*}, r_{2}^{*}, r_{3}^{*})}^{⊤}$ denote the Tucker rank of $L^{*}$ , i.e., $r_{k}^{*} = r a n k (L_{(k)}^{*})$ . Then, we have the reduced SVD factorization $T_{(k)} = U_{k} S_{k} {V_{k}}^{⊤}$ , where $U_{k} \in R^{d_{k} \times r_{k}^{*}}$ and $V_{k} \in R^{(d_{\ k}) \times r_{k}^{*}}$ are orthogonal and $S_{k} \in R^{r_{k}^{*} \times r_{k}^{*}}$ is diagonal. Let $T \in R^{d_{1} \times d_{2} \times d_{3}}$ be an arbitrary tensor. Similar to [22], we define the following two projections for any mode k = 1, 2, 3:

\begin{aligned} P_{k}^{⊥} (T) & = (I - U_{k} {U_{k}}^{⊤}) T_{(k)} (I - V_{k} {V_{k}}^{⊤}) and \\ P_{k} (T) & = T_{(k)} - P_{k}^{⊥} (T), \end{aligned} (17)

where I denotes the identity matrix of appropriate dimensionality.

4.1 A Deterministic Bound on the Estimation Error

Before bounding the Frobenius-norm error $‖ \hat{L} - L^{*} ‖_{F}$ , the particularity of the error tensor $Δ ≔ \hat{L} - L^{*}$ is first characterized by a certain choice of regularization parameter λ involving the dual T2NN norm in the following proposition.

Proposition 1. By setting the regularization parameter $λ \geq 2 σ ‖ X^{*} (ξ) ‖_{t 2 nn}^{*}$ , we have

(I) rank inequality:

\begin{aligned} {r a n k}_{t b} (P (Δ)) \leq 2 r_{tb}^{*}, a n d \\ r a n k (P_{k} (Δ)) \leq 2 r_{k}^{*}, k = 1,2,3, \end{aligned} (18)

(II) sum of norms inequality:

\begin{aligned} γ ‖ P^{⊥} (Δ) ‖_{t n n} + (1 - γ) \sum_{k = 1}^{3} α_{k} ‖ P_{k}^{⊥} (Δ) ‖_{*} \\ \leq 3 (γ ‖ P (Δ) ‖_{t n n} + (1 - γ) \sum_{k = 1}^{3} α_{k} ‖ P_{k} (Δ) ‖_{*}), \end{aligned} (19)

(III) an upper bound on the “observed” error:

\begin{aligned} ‖ X (Δ) ‖_{2}^{2} \leq 3 (γ \sqrt{2 r_{tb}^{*}} + (1 - γ) \sum_{k = 1}^{3} α_{k} \sqrt{2 r_{k}^{*}}) ‖ Δ ‖_{F} . \end{aligned} (20)

Proof of Proposition 1. The proof is given as follows:Proof of Part (I): According to the definition of $P (T)$ in Eq. 16, we have

\begin{aligned} P (T) = T - P^{⊥} (T) & = U * U^{⊤} * T + T * V * V^{⊤} \\ - U * U^{⊤} * T * V * V^{⊤}, \\ = U * U^{⊤} * T + (I - U * U^{⊤}) \\ * T * V * V^{⊤} . \end{aligned}

Due to the facts that ${r a n k}_{t b} (A * B) \leq \max {{r a n k}_{t b} (A), {r a n k}_{t b} (B)}$ , ${r a n k}_{t b} (A + B) \leq {r a n k}_{t b} (A) + {r a n k}_{t b} (B)$ [26], and ${r a n k}_{t b} (U) = {r a n k}_{t b} (V) = r_{tb}^{*}$ , we have

\begin{aligned} {r a n k}_{t b} (P (T)) \leq {r a n k}_{t b} (U * U^{⊤} * T) + {r a n k}_{t b} ((I - U * U^{⊤}) * T * V * V^{⊤}) \leq 2 r_{tb}^{*} . \end{aligned}

Also, according to the definition of $P (T)$ in Eq. 17, we have

\begin{aligned} P_{k} (T) = T_{(k)} - P_{k}^{⊥} (T) & = U_{k} U_{k}^{⊤} T_{(k)} + T_{(k)} V_{k} V_{k}^{⊤} \\ - U_{k} U_{k}^{⊤} T_{(k)} V_{k} V_{k}^{⊤} \\ = U_{k} U_{k}^{⊤} T_{(k)} + (I - U_{k} U_{k}^{⊤}) \\ \times T_{(k)} V_{k} V_{k}^{⊤} . \end{aligned}

Due to the facts that $r a n k (A B) \leq \max {r a n k (A), r a n k (B)}$ , $r a n k (A + B) \leq r a n k (A) + r a n k (B)$ [26], and $r a n k (U_{k}) = r a n k (V_{k}) = r_{k}^{*}$ , we have

\begin{aligned} r a n k (P_{k} (T)) \leq r a n k (U_{k} U_{k}^{⊤} * T_{(k)}) \\ + r a n k ((I - U_{k} U_{k}^{⊤}) T_{(k)} V_{k} V_{k}^{⊤}) \leq 2 r_{k}^{*} . \end{aligned}

Proof of Part (II) and Part (III): The optimality of $\hat{L}$ to Problem Eq. 15 indicates

\begin{aligned} \frac{1}{2} ‖ y - X (\hat{L}) ‖_{2}^{2} + λ ‖ \hat{L} ‖_{t 2 nn} \leq \frac{1}{2} ‖ y - X (L^{*}) ‖_{2}^{2} + λ ‖ L^{*} ‖_{t 2 nn} . \end{aligned}

By the definition of the error tensor $Δ ≔ \hat{L} - L^{*}$ , we can get $X (\hat{L}) = X (L^{*}) + X (Δ)$ , which leads to

\begin{aligned} \frac{1}{2} ‖ y - X (L^{*}) - X (Δ) ‖_{2}^{2} - \frac{1}{2} ‖ y - X (L^{*}) ‖_{2}^{2} \\ \leq λ ‖ L^{*} ‖_{t 2 nn} - λ ‖ \hat{L} ‖_{t 2 nn} . \end{aligned}

The definition that $σ ξ = y - X (L^{*})$ yields

\begin{aligned} \frac{1}{2} ‖ X (Δ) ‖_{2}^{2} \leq ⟨ X (Δ), σ ξ ⟩ + λ (‖ L^{*} ‖_{t 2 nn} - ‖ \hat{L} ‖_{t 2 nn}) \\ \leq ⟨ X^{*} (ξ), Δ ⟩ + λ (‖ L^{*} ‖_{t 2 nn} - ‖ \hat{L} ‖_{t 2 nn}), \end{aligned}

where the last inequality holds due to the definition of the adjoint operator $X^{*} (\cdot)$ .According to the definition and upper bound of the dual T2NN norm in Lemma 1 and Lemma 2, we obtain

\begin{aligned} \frac{1}{2} ‖ X (Δ) ‖_{2}^{2} \leq σ ‖ X^{*} (ξ) ‖_{t 2 nn}^{*} ‖ Δ ‖_{t 2 nn} + λ (‖ L^{*} ‖_{t 2 nn}^{*} - ‖ \hat{L} ‖_{t 2 nn}) . \end{aligned} (21)

According to the decomposibility of TNN (see the supplementray material of [25]) and the decomposibility of matrix nuclear norm [27], one has

\begin{aligned} ‖ L^{*} ‖_{t n n} - ‖ \hat{L} ‖_{t n n} & = ‖ L^{*} ‖_{t n n} - ‖ L^{*} + Δ ‖_{t n n} \\ = ‖ L^{*} ‖_{t n n} - ‖(L^{*} + P^{⊥} (Δ)) + P (Δ)‖ \\ \leq ‖ L^{*} ‖_{t n n} - (‖ L^{*} + P^{⊥} (Δ) ‖_{t n n} - ‖ P (Δ) ‖_{t n n}) \\ = ‖ L^{*} ‖_{t n n} - (‖ L^{*} ‖_{t n n} + ‖ P^{⊥} (Δ) ‖_{t n n} - ‖ P (Δ) ‖_{t n n}) \\ = ‖ P (Δ) ‖_{t n n} - ‖ P^{⊥} (Δ) ‖_{t n n} \end{aligned}

and

\begin{aligned} ‖ L_{(k)}^{*} ‖_{*} - ‖ {\hat{L}}_{(k)} ‖_{*} & = ‖ L_{(k)}^{*} ‖_{*} - ‖ L_{(k)}^{*} + {\hat{Δ}}_{(k)}^{*} ‖_{*} \\ = ‖ L_{(k)}^{*} ‖_{*} - ‖ (L_{(k)}^{*} + P_{k}^{⊥} (Δ)) + P_{k} (Δ) ‖_{*} \\ \leq ‖ L_{(k)}^{*} ‖_{*} - (‖ L_{(k)}^{*} + P_{k}^{⊥} (Δ) ‖_{*} - ‖ P_{k} (Δ) ‖_{*}) \\ = ‖ L_{(k)}^{*} ‖_{*} - (‖ L_{(k)}^{*} ‖_{*} + ‖ P_{k}^{⊥} (Δ) ‖_{*} - ‖ P_{k} (Δ) ‖_{*}) \\ = ‖ P_{k} (Δ) ‖_{*} - ‖ P_{k}^{⊥} (Δ) ‖_{*} . \end{aligned}

Then, we obtain

\begin{aligned} ‖ L^{*} ‖_{t 2 nn} - ‖ \hat{L} ‖_{t 2 nn} \\ \leq (γ ‖ P (Δ) ‖_{t n n} + (1 - γ) \sum_{k = 1}^{3} α_{k} ‖ P_{k} (Δ) ‖_{*}) \\ - (γ ‖ P^{⊥} (Δ) ‖_{t n n} + (1 - γ) \sum_{k = 1}^{3} α_{k} ‖ P_{k}^{⊥} (Δ) ‖_{*}) . \end{aligned} (22)

Using the definition of T2NN and triangular inequality yields

\begin{aligned} ‖ Δ ‖_{t 2 nn} \\ \leq (γ ‖ P (Δ) ‖_{t n n} + (1 - γ) \sum_{k = 1}^{3} α_{k} ‖ P_{k} (Δ) ‖_{*}) \\ + (γ ‖ P^{⊥} (Δ) ‖_{t n n} + (1 - γ) \sum_{k = 1}^{3} α_{k} ‖ P_{k}^{⊥} (Δ) ‖_{*}) . \end{aligned} (23)

Further using the setting $λ \geq 2 σ ‖ X^{*} (m ξ) ‖_{t 2 nn}^{*}$ yields Part (III),

\begin{aligned} \frac{1}{2} ‖ X (Δ) ‖_{2}^{2} \\ \overset{(i)}{\leq} \frac{3 λ}{2} (γ ‖ P (Δ) ‖_{t n n} + (1 - γ) \sum_{k = 1}^{3} α_{k} ‖ P_{k} (Δ) ‖_{*}) \\ - \frac{λ}{2} (γ ‖ P^{⊥} (Δ) ‖_{t n n} + (1 - γ) \sum_{k = 1}^{3} α_{k} ‖ P_{k}^{⊥} (Δ) ‖_{*}) \\ \leq \frac{3 λ}{2} (γ ‖ P (Δ) ‖_{t n n} + (1 - γ) \sum_{k = 1}^{3} α_{k} ‖ P_{k} (Δ) ‖_{*}) \\ \overset{(i i)}{\leq} \frac{3 λ}{2} (γ \sqrt{2 r_{tb}^{*}} ‖ P (Δ) ‖_{F} + (1 - γ) \sum_{k = 1}^{3} α_{k} \sqrt{2 r_{k}^{*}} ‖ P_{k} (Δ) ‖_{F}) \\ \overset{(i i i)}{\leq} \frac{3 λ}{2} (γ \sqrt{2 r_{tb}^{*}} ‖ Δ ‖_{F} + (1 - γ) \sum_{k = 1}^{3} α_{k} \sqrt{2 r_{k}^{*}} ‖ Δ ‖_{F}) \\ = \frac{3 λ}{2} (γ \sqrt{2 r_{tb}^{*}} + (1 - γ) \sum_{k = 1}^{3} α_{k} \sqrt{2 r_{k}^{*}}) ‖ Δ ‖_{F}, \end{aligned}

where by combing (i) and $‖ X (Δ) ‖_{2}^{2} \geq 0$ , Part (II) can be directly proved; inequality (ii) holds due to the compatible inequality of TNN and matrix nuclear norm, i.e., $‖ T ‖_{t n n} \leq \sqrt{{r a n k}_{t b} (T)} ‖ T ‖_{F}$ [25] and $‖ T ‖_{*} \leq \sqrt{r a n k (T)} ‖ T ‖_{F}$ [27], and Iiequality (iii) holds because one can easily verify the facts that $‖ P (Δ) ‖_{F}^{2} = ‖ Δ ‖_{F}^{2} - ‖ P^{⊥} (Δ) ‖_{F}^{2} \leq ‖ Δ ‖_{F}^{2}$ [25] and $‖ P_{k} (Δ) ‖_{F}^{2} = ‖ Δ_{(k)} ‖_{F}^{2} - ‖ P_{k}^{⊥} (Δ) ‖_{F}^{2} \leq ‖ Δ_{(k)} ‖_{F}^{2} = ‖ Δ ‖_{F}^{2}$ [27].Note that inequality (20) gives an upper bound on the $‖ X (Δ) ‖_{2}$ , which can be seen as the “observed” error. However, we are more concerned about upper bounds on the error itself ‖Δ‖_F rather than its observed version. The following assumption builds a bridge between $‖ X (Δ) ‖_{2}$ and ‖Δ‖_F.

Assumption 1 (RSC condition). The observation operator $X (\cdot)$ is said to satisfy the Restricted Strong Convexity (RSC) condition with parameter κ if the following inequality holds:

\begin{aligned} ‖ X (T) ‖_{2}^{2} \geq κ ‖ T ‖_{F}^{2}, \end{aligned} (24)

for any $T \in R^{d_{1} \times d_{2} \times d_{3}}$ belong to the restricted direction set,

\begin{aligned} C & ≔ \{T| γ ‖ P^{⊥} (T) ‖_{t n n} + (1 - γ) \sum_{k = 1}^{3} α_{k} ‖ P_{k}^{⊥} (T) ‖_{*} \\ \leq 3 (γ ‖ P (T) ‖_{t n n} + (1 - γ) \sum_{k = 1}^{3} α_{k} ‖ P_{k} (T) ‖_{*})\} . \end{aligned} (25)

Then, a straightforward combination of Proposition 1 and Assumption 1 leads to an deterministic bound on the estimation error.

Theorem 1. By setting the regularization parameter $λ \geq 2 σ ‖ X^{*} (ξ) ‖_{t 2 nn}^{*}$ , we have the following error bound for any solution $\hat{L}$ to Problem (15):

\begin{aligned} ‖ L^{*} - \hat{L} ‖_{F} \leq \frac{3 \sqrt{2}}{κ} λ (γ \sqrt{r_{tb}^{*}} + (1 - γ) \sum_{k = 1}^{3} α_{k} \sqrt{r_{k}^{*}}) . \end{aligned} (26)

Note that we do not require information for distribution of the noise ξ in Theorem 1, which indicates that Theorem 1 provides a deterministic bound for general noise type. The bound on the right-hand side of Eq. 26 is in terms of the quantity,

\begin{aligned} γ \sqrt{r_{tb}^{*}} + (1 - γ) \sum_{k = 1}^{3} α_{k} \sqrt{r_{k}^{*}}, \end{aligned}

which serves as a measure of structure complexity, reflecting the natural intuition that more complex structure causes larger error. The result is in consistent with the results for sum-of-norms-based estimators in [5, 22, 24, 28]. A more general analysis in [24, 28] indicates that the performance of sum-of-norms-based estimators are determined by all the structural complexities for a simultaneously structured signal, just as shown by the proposed bound (26).

4.2 Tensor Compressive Sensing

In this section, we consider tensor compressive sensing from random Gaussian design where ${X_{i}}$ ’s are random tensors with i. i.d. standard Gaussian entries [22]. First, the RSC condition holds in random Gaussian design as shown in the following lemma.

Lemma 3 (RSC of random Gaussian design). If $X (\cdot) : R^{d_{1} \times d_{2} \times d_{3}} \to R^{N}$ is a random Gaussian design, then a version of the RSC condition is satisfied with probability at least 1–2 exp(−N/32) as follows:

\begin{aligned} ‖ X (Δ) ‖ & \geq \frac{\sqrt{N}}{4} ‖ Δ ‖_{F} - \frac{1}{16} (\frac{\sqrt{d_{1} d_{3}} + \sqrt{d_{2} d_{3}}}{γ} & + \sum_{k = 1}^{3} \frac{\sqrt{d_{k}} + \sqrt{d_{\ k}}}{α_{k} (1 - γ)}) ‖ Δ ‖_{t 2 nn}, \end{aligned} (27)

for any tensor $Δ \in R^{d_{1} \times d_{2} \times d_{3}}$ in the restricted direction set $C$ whose definition is given in Eq. 25.Proof of Lemma 3. The proof is analogous to that of Proposition 1 in [27]. The difference lies in how we lower bound the right hand side of (H.7) in [27], i.e.,

\begin{aligned} E [\inf_{Θ \in R (t)} \sup_{u \in S^{N - 1}} Y_{u, Θ}], \end{aligned} (27a)

where $S^{N - 1} ≔ \{v \in R^{N}| ‖ v ‖_{2} = 1\}$ , $R (t) = {Θ \in R^{d_{1} \times d_{2} \times d_{3}}| ‖ Θ ‖_{F} = 1, ‖ Θ ‖_{t 2 nn} \leq t}$ , and

\begin{aligned} Y_{u, Θ} ≔ ⟨ g, u ⟩ + ⟨ G, Θ ⟩, \end{aligned}

where random vector $g \in R^{N}$ and random tensor $G \in R^{d_{1} \times d_{2} \times d_{3}}$ are independent with i. i.d. $N (0,1)$ entries.We bound the quantity in Eq. 27 as follows:

\begin{aligned} E [\inf_{Θ \in R (t)} \sup_{u \in S^{N - 1}} Y_{u, Θ}] & = E [\sup_{u \in S^{N - 1}} ⟨ g, u ⟩] + E [\inf_{Θ \in R (t)} ⟨ G, Θ ⟩] \\ = E [‖ g ‖_{2}] - E [\sup_{Θ \in R (t)} ⟨ G, Θ ⟩] \\ = \frac{1}{2} \sqrt{N} - t E [‖ G ‖_{t 2 nn}^{*}], \end{aligned} (28)

where $‖ G ‖_{t 2 nn}^{*}$ can be bounded according to Lemma 4. The rest of the proof follows that of Proposition 1 in [27].The remaining bound on $‖ X^{*} (ξ) ‖_{t 2 nn}^{*}$ is shown in the following Lemma.

Lemma 4 (bound on $‖ X^{*} (ξ) ‖_{t 2 nn}^{*}$ ). Let $X : R^{d_{1} \times d_{2} \times d_{3}} \to R^{N}$ be a random Gaussian design. With high probability, the quantity $‖ X^{*} (ξ) ‖_{t 2 nn}^{*}$ is concentrated around its mean, which can be bounded as follows:

\begin{aligned} E [‖ X^{*} (ξ) ‖_{t 2 nn}^{*}] \leq c \sqrt{N} (\frac{\sqrt{d_{1} d_{3}} + \sqrt{d_{2} d_{3}}}{γ} + \sum_{k = 1}^{3} \frac{\sqrt{d_{k}} + \sqrt{d_{k}}}{α_{k} (1 - γ)}) . \end{aligned} (29)

Proof. Since ξ_m’s are i. i.d. $N (0,1)$ variables, we have

\begin{aligned} ‖ ξ ‖_{2} \leq 2 \sqrt{N} \end{aligned} (29a)

with high probability according to Proposition 8.1 in [29].For k = 1, 2, 3, let X^∗(ξ) (_k) be the mode-k unfolding of random tensor $X^{*} (ξ)$ . A direct use of Lemma C.1 in [27] leads to

\begin{aligned} E [‖ X^{*} {(ξ)}_{(k)} ‖] \leq c_{0} \sqrt{N} (\sqrt{d_{k}} + \sqrt{d_{\ k}}) \end{aligned} (30)

with high probability. A similar argument of Lemma C.1 in [27] also yields

\begin{aligned} E [‖ X^{*} (ξ) ‖] \leq c_{1} \sqrt{N} (\sqrt{d_{1} d_{3}} + \sqrt{d_{2} d_{3}}) \end{aligned} (31)

with high probability. Combining Eqs 30, 31, we can complete the proof.Then, the non-asymptotic error bound is obtained finally as follows.

Theorem 2 (non-asymptotic error bound). Under the random Gaussian design setup, there are universal constants c₃, c₄, and c₅ such that for a sample size N greater than

\begin{aligned} c_{3} {(\frac{\sqrt{d_{1} d_{3}} + \sqrt{d_{2} d_{3}}}{γ} + \sum_{k = 1}^{3} \frac{\sqrt{d_{k}} + \sqrt{d_{\ k}}}{α_{k} (1 - γ)})}^{2} \\ \times {(γ \sqrt{r_{tb}^{*}} + (1 - γ) \sum_{k = 1}^{3} α_{k} \sqrt{r_{k}^{*}})}^{2} \end{aligned}

and any solution to Problem (15) with regularization parameter

\begin{aligned} λ = c_{4} σ \sqrt{N} (\frac{\sqrt{d_{1} d_{3}} + \sqrt{d_{2} d_{3}}}{γ} + \sum_{k = 1}^{3} \frac{\sqrt{d_{k}} + \sqrt{d_{\ k}}}{α_{k} (1 - γ)}), \end{aligned}

then we have

\begin{aligned} ‖ Δ ‖_{F}^{2} & \leq c_{5} σ^{2} {(\frac{\sqrt{d_{1} d_{3}} + \sqrt{d_{2} d_{3}}}{γ} + \sum_{k = 1}^{3} \frac{\sqrt{d_{k}} + \sqrt{d_{\ k}}}{α_{k} (1 - γ)})}^{2} \\ \frac{{(γ \sqrt{r_{tb}^{*}} + (1 - γ) \sum_{k = 1}^{3} α_{k} \sqrt{r_{k}^{*}})}^{2}}{N}, \end{aligned} (32)

which holds with high probability.To understand the proposed bound, we consider the three-way cubical tensor $L^{*} \in R^{d \times d \times d}$ with regularization weights γ = (1 − γ)α₁ = (1 − γ)α₂ = (1 − γ)α₃ = 1/4. Then, the bound in Eq. 52 is simplified to the following element-wise error:

\begin{aligned} \frac{‖ \hat{L} - L^{*} ‖_{F}^{2}}{d^{3}} \leq O (σ^{2} \cdot \frac{1}{N} \cdot {(\sqrt{\frac{r_{tb}^{*}}{d}} + \sum_{k = 1}^{3} \sqrt{\frac{r_{k}^{*}}{d}})}^{2}), \end{aligned} (33)

which means the estimation error is controlled by the tubal rank and Tucker rank of $L^{*}$ simultaneously. From the right-hand side of Eq. 33, it can be seen that the more observations (i.e., the larger N), the smaller the error; it is also reflected that larger tensors with more complex structures will lead to larger errors. The interpretation is consistent with our intuition.Equation 33 also indicates the sample size N should satisfy

\begin{aligned} N \geq Ω ({(\sqrt{r_{tb}^{*}} + \sum_{k = 1}^{3} \sqrt{r_{k}^{*}})}^{2} d^{2}) \end{aligned} (34)

for approximate tensor sensing.Another interesting result is that by setting the noise level σ = 0 in Eq. 33, the upper bound reaches 0, which means the proposed estimator can exactly recover the unknown truth $L^{*}$ in the noiseless setting.

4.3 Noisy Tensor Completion

For noisy tensor completion, we consider a slightly modified estimator,

\begin{aligned} \hat{L} \in \underset{‖ L ‖_{\infty} \leq a}{argmin} \frac{1}{2} ‖ y - X (L) ‖_{2}^{2} + λ ‖ L ‖_{t 2 nn}, \end{aligned} (35)

where $m a > 0$ is a known constant constraining the magnitude of entries in $L^{*}$ . The constraint $‖ L ‖_{\infty} \leq a$ is very mild because real signals are all of limited magnitude, e.g., the intensity of pixels in visual light images cannot be greater than 255. The constraint also provides with theoretical continence in excluding the “spiky” tensors while controlling the identifiability of $L^{*}$ . Similar “non-spiky” constraints are also considered in related work [6, 16, 30].

We consider noisy tensor completion under uniform sampling in this section.

Assumption 2 (uniform sampling scheme). The design tensors ${X_{i}}$ are i.i.d. random tensor bases drawn from uniform distribution Π on the set,

\begin{aligned} \{e_{i} ◦ e_{j} ◦ e_{k} : \forall (i, j, k) \in [d_{1}] \times [d_{2}] \times [d_{3}]\} . \end{aligned}

Recall that Proposition 1 in Section 4.1 gives an upper bound on the “observed part” of the estimation error $‖ X (Δ) ‖_{F}$ . As our goal is to establish a bound on ‖Δ‖_F, we then connect $‖ X (Δ) ‖_{F}$ with ‖Δ‖_F by quantifying the probability of the following RSC property of the sampling operator $X$ :

\begin{aligned} \frac{1}{N} ‖ X (Δ) ‖_{F}^{2} \geq \frac{1}{2 d_{1} d_{2} d_{3}} ‖ Δ ‖_{F}^{2} - a n i n t e r c e p t t e r m, \end{aligned}

when the error tensor Δ belongs to some set $m C (β, m r)$ defined as

\begin{aligned} C (β, r) & ≔ {Δ \in R^{d_{1} \times d_{2} \times d_{3}}| ‖ Δ ‖_{\infty} \leq 1, \frac{‖ Δ ‖_{F}^{2}}{d_{1} d_{2} d_{3}} \geq β, \\ ‖ Δ ‖_{t 2 nn} \leq (γ \sqrt{r} + (1 - γ) \sum_{k = 1}^{3} α_{k} \sqrt{r_{k}}) ‖ Δ ‖_{F}}, \end{aligned} (36)

where $β = \sqrt{\frac{64 \log \tilde{d}}{N \log (6 / 5)}}$ is an F-norm tolerance parameter and mr = (r, r₁, r₂, r₃) is a rank parameter whose values will be specified in the sequel.

Lemma 5 (RSC condition under uniform sampling). For any $Δ \in C (β, r)$ , it holds with probability at least $1 - {(d_{1} d_{3} + d_{2} d_{3})}^{- 1}$ that

\begin{aligned} \frac{1}{N} ‖ X (Δ) ‖_{F}^{2} & \geq \frac{‖ Δ ‖_{F}^{2}}{2 d_{1} d_{2} d_{3}} - \frac{44 d_{1} d_{2} d_{3}}{N^{2}} E {[‖ X^{*} (ϵ) ‖_{t 2 nn}^{*}]}^{2} \\ \times {(γ \sqrt{r} + (1 - γ) \sum_{k = 1}^{3} α_{k} \sqrt{r_{k}})}^{2}, \end{aligned} (37)

where e is the base of the natural logarithm, and the entries ϵ_i of vector $ϵ \in R^{N}$ are i. i.d. Rademacher random variables.Before proving Lemma 5, we first define a subset of $C (β, r)$ by upper bounding the F-norm of any element Δ in it,

\begin{aligned} B (r, T) ≔ \{Δ : Δ \in C (β, r), \frac{‖ Δ ‖_{F}^{2}}{d_{1} d_{2} d_{3}} \leq T\}, \end{aligned}

and a quantity

\begin{aligned} Z_{T} ≔ \sup_{Δ \in B (r, T)} |\frac{‖ X (Δ) ‖_{2}^{2}}{N} - \frac{‖ Δ ‖_{F}^{2}}{d_{1} d_{2} d_{3}}|, \end{aligned}

which is the maximal absolute deviation of $N^{- 1} ‖ X (Δ) ‖_{2}^{2}$ from its expectation ${(d_{1} d_{2} d_{3})}^{- 1} ‖ Δ ‖_{F}^{2}$ in $B (r, T)$ . Lemma 6 shows the concentration behavior of Z_T.

Lemma 6 (concentration of Z_T). There exists a constant c₀ such that

\begin{aligned} P [Z_{T} \geq \frac{5}{12} T - \frac{44 d_{1} d_{2} d_{3}}{N^{2}} E {[‖ X^{*} (ϵ) ‖]}^{2} (γ \sqrt{r} + (1 - γ) \\ {\times \sum_{k} α_{k} \sqrt{r_{k}})}^{2}] \leq \exp (- c_{0} N T^{2}) . \end{aligned}

Proof of Lemma 6. The proof is similar to that of Lemma 10 in [31]. The difference lies in the step of symmetrization arguments. Note that for any $Δ \in B (r, T)$ , it holds that

\begin{aligned} ‖ Δ ‖_{t n n} \leq \sqrt{d_{1} d_{2} d_{3} T} (γ \sqrt{r} + (1 - γ) \sum_{k} α_{k} \sqrt{r_{k}}), \end{aligned}

which indicates

\begin{aligned} E [Z_{T}] & \leq 8 E [\sup_{Δ \in B (r, T)} ⟨ X^{*} (ϵ), Δ ⟩] \\ \leq 8 E [\sup_{Δ \in B (r, T)} \frac{1}{d_{3}} ‖ X^{*} (ϵ) ‖ ‖ Δ ‖_{t n n}] \\ \leq 8 \sqrt{d_{1} d_{2} d_{3} T} E [‖ X^{*} (ϵ) ‖] (γ \sqrt{r} + (1 - γ) \sum_{k} α_{k} \sqrt{r_{k}}) . \end{aligned}

Then, Lemma 5 can be proved by using the peeling argument [30].Proof of Lemma 5. For any positive integer l, we define disjoint subsets of $C (β, r)$ as

\begin{aligned} D (r, l) ≔ {Δ : Δ \in C (β, r), β ρ^{l - 1} \leq \frac{‖ Δ ‖_{F}^{2}}{d_{1} d_{2} d_{3}} \leq β ρ^{l}} \end{aligned}

with constants $ρ = \frac{6}{5}$ and $β = \sqrt{\frac{64 \log \tilde{d}}{N \log ρ}}$ . Let D = d₁d₂d₃ for simplicity, and define the event

\begin{aligned} E & ≔ \{\exists Δ \in C (β, r), s . t . |\frac{‖ X (Δ) ‖_{2}^{2}}{N} - \frac{‖ Δ ‖_{F}^{2}}{D}| \geq \frac{‖ Δ ‖_{F}^{2}}{2 D} \\ + \frac{44 D}{N^{2}} E^{2} [‖ X^{*} (ϵ) ‖] {(γ \sqrt{r} + (1 - γ) \sum_{k} α_{k} \sqrt{r_{k}})}^{2}\} \end{aligned}

and its sub-events for any $l \in N_{+}$ ,

\begin{aligned} E_{l} & ≔ \{\exists Δ \in D (r, l), s . t . |\frac{‖ X (Δ) ‖_{2}^{2}}{N} - \frac{‖ Δ ‖_{F}^{2}}{D}| \geq \frac{5}{12} β ρ^{l} \\ + \frac{44 D}{N^{2}} E^{2} [‖ X^{*} (ϵ) ‖] {(γ \sqrt{r} + (1 - γ) \sum_{k} α_{k} \sqrt{r_{k}})}^{2}\} . \end{aligned}

Note that Lemma 6 implies that

\begin{aligned} P [E_{l}] & = P_{Δ \in C (r, l)} [Z_{T} \geq \frac{5}{12} β ρ^{l} + \frac{44 D}{N^{2}} E^{2} [‖ X^{*} (ϵ) ‖] \\ \times {(γ \sqrt{r} + (1 - γ) \sum_{k} α_{k} \sqrt{r_{k}})}^{2}] \\ \leq \exp (- c_{0} N β^{2} ρ^{2 l}) . \end{aligned}

Thus, we have

\begin{aligned} P [E] \leq \sum_{l = 1}^{\infty} P [E_{l}] & \leq \sum_{l = 1}^{\infty} \exp (- c_{0} N β^{2} ρ^{2 l}) \\ \leq \sum_{l = 1}^{\infty} \exp (- 2 c_{0} N β^{2} l \log ρ) \\ \leq \frac{\exp (- c_{0} p β^{2} \log ρ)}{1 - \exp (- c_{0} p β^{2} \log ρ)} . \end{aligned}

Recall that $β = \sqrt{\frac{64 \log \tilde{d}}{\log (6 / 5) N}}$ , then $P [E] \leq 2 / \tilde{d}$ , which leads to the result of Lemma 5.Based on the RSC condition in Lemma 5, we are able to give an upper bound on the estimation error ‖Δ‖_F in the following proposition.

Proposition 2. With parameter $λ \geq 2 σ ‖ X^{*} (ξ) ‖_{t 2 nn}^{*}$ , the estimation error satisfies

\begin{aligned} \frac{‖ Δ ‖_{F}^{2}}{d_{1} d_{2} d_{3}} \\ \leq \max \{c_{1} \frac{d_{1} d_{3} d_{3}}{N^{2}} (λ^{2} + a^{2} E {[‖ X^{*} (ϵ) ‖]}^{2}) \times (γ \sqrt{r_{tb}^{*}} + (1 - γ) \\ {\sum_{k} α_{k} \sqrt{r_{k}^{*}})}^{2}, c_{2} a^{2} \sqrt{\frac{\log \tilde{d}}{N}}\} \end{aligned} (38)

with probability at least $1 - 2 {(d_{1} d_{3} + d_{2} d_{3})}^{- 1}$ .Proof of Proposition 2. A direct consequence of property (II) in Proposition 1 and the triangular inequality is that the error tensor Δ satisfies

\begin{aligned} ‖ Δ ‖_{t 2 nn} \leq (γ \sqrt{32 r_{tb}^{*}} + (1 - γ) \sum_{k = 1}^{3} α_{k} \sqrt{32 r_{k}^{*}}) ‖ Δ ‖_{F} . \end{aligned} (39)

Since $‖ \hat{L} ‖_{\infty} < a$ and $‖ L^{*} ‖_{\infty} < a$ , we also have $‖ Δ ‖_{\infty} \leq ‖ \hat{L} - L^{*} ‖_{\infty} \leq ‖ \hat{L} ‖_{\infty} + ‖ L^{*} ‖_{\infty} < 2 a$ .Let $r^{*} = {(r_{tb}^{*}, r_{1}^{*}, r_{2}^{*}, r_{3}^{*})}^{⊤}$ denote the rank complexity of the underlying tensor $L^{*}$ . By discussing whether tensor $\frac{Δ}{2 a}$ is in set $C (β, 32 r^{*})$ , we consider the following cases.Case 1: If $\frac{Δ}{2 a} \notin C (β, 32 r^{*})$ , then from the definition of set $C (β, r)$ , we have

\begin{aligned} \frac{‖ Δ ‖_{F}^{2}}{d_{1} d_{2} d_{3}} \leq 4 a^{2} \sqrt{\frac{64 \log \tilde{d}}{\log (6 / 5) N}} . \end{aligned} (40)

Case 2: If $\frac{Δ}{2 m a} \in C (β, 32 r^{*})$ , then by Proposition 1 and Lemma 5, we have

\begin{aligned} \frac{{‖\frac{Δ}{2 a}‖}_{F}^{2}}{2 D} - \frac{44 \cdot 32 D}{N^{2}} E^{2} [‖ X^{*} (ϵ) ‖] \times {(γ \sqrt{r_{tb}^{*}} + (1 - γ) \sum_{k} α_{k} \sqrt{r_{k}^{*}})}^{2} \\ \leq \frac{3 \sqrt{2} λ}{N} (γ \sqrt{r_{tb}^{*}} + (1 - γ) \sum_{k} α_{k} \sqrt{r_{k}^{*}}) {‖\frac{Δ}{2 a}‖}_{F} \end{aligned} (41)

with probability at least $1 - 2 {(d_{1} d_{3} + d_{2} d_{3})}^{- 1}$ .By performing some algebra (like the proof of Theorem 3 in [30]), we have

\begin{aligned} \frac{‖ Δ ‖_{F}^{2}}{d_{1} d_{2} d_{3}} & \leq C \frac{d_{1} d_{2} d_{3}}{N^{2}} (λ^{2} + a^{2} E^{2} [‖ X^{*} (ϵ) ‖]) \\ \times {(γ \sqrt{r_{tb}^{*}} + (1 - γ) \sum_{k} α_{k} \sqrt{r_{k}^{*}})}^{2} . \end{aligned} (42)

Combining Case 1 and Case 2, we obtain the result of Proposition 2. According to Proposition 2 and Lemma 5, it remains to bound $‖ X^{*} (ξ) ‖_{t 2 nn}^{*}$ and $E [‖ X^{*} (ϵ) ‖_{t 2 nn}^{*}]$ . The following lemmas give their bounds respectively. As the noise variables {ξ_i} are i. i.d. standard Gaussian, it belongs to the sub-exponential distribution [32], and thus, there exists a constant ϱ as the smallest number satisfying [30]

\begin{aligned} \max_{i \leq N} E [e^{\frac{| ξ_{i} |}{ϱ}}] \leq e . \end{aligned} (43)

Suppose the sample complexity N in noisy tensor completion satisfies

\begin{aligned} N & \geq \max {d_{1} d_{3} \lor d_{2} d_{3}, \max_{k} (d_{k} \lor d_{\ k})} \\ N & \geq 2 (d_{1} \land d_{2}) ϱ^{2} \log^{2} (ϱ \sqrt{d_{1} \land d_{2}}) \log (d_{1} d_{3} + d_{2} d_{3}) \\ N & \geq \max_{k \leq 3} 2 (d_{k} \land d_{\ k}) ϱ^{2} \log^{2} (ϱ \sqrt{d_{k} \land d_{\ k}}) \log (d_{k} + d_{\ k}) \\ N & \geq 2 (d_{1} \land d_{2}) \log^{2} (\sqrt{d_{1} \land d_{2}}) \log (d_{1} d_{3} + d_{2} d_{3}) \\ N & \geq \max_{k \leq 3} 2 (d_{k} \land d_{\ k}) \log^{2} (\sqrt{d_{k} \land d_{\ k}}) \log (d_{k} + d_{\ k}) . \end{aligned} (44)

Then, we have the following Lemma 7 and Lemma 8 to bound $‖ X^{*} (ξ) ‖_{t 2 nn}^{*}$ and $E [‖ X^{*} (ϵ) ‖_{t 2 nn}^{*}]$ .

Lemma 7. Under the sample complexity of noisy tensor completion in Eq. 44, it holds with probability at least $1 - {(d_{1} d_{3} + d_{2} d_{3})}^{- 1} - \sum_{k} {(d_{k} + d_{\ k})}^{- 1}$ that

\begin{aligned} ‖ X^{*} (ξ) ‖_{t 2 nn}^{*} & \leq C_{ϱ} \sqrt{N} (\frac{1}{γ} \sqrt{\frac{\log (d_{1} d_{3} + d_{2} d_{3})}{d_{1} \land d_{2}}} + \frac{1}{(1 - γ)} \\ \times \sum_{k = 1}^{3} \frac{1}{α_{k}} \sqrt{\frac{\log (d_{k} + d_{\ k})}{d_{k} \land d_{\ k}}}), \end{aligned} (45)

where C_ϱ is a constant dependent on the ϱ that is defined in Eq. 43.Proof of Lemma 7. The proof can be straightforwardly obtained by adopting the upper bound of the dual T2NN norm in Lemma 2 and Lemma 5 in the supplementary material of [25], and Lemma 5 in [30] as follows:

• First, Lemma 5 in the supplementary material of [25] shows that letting N ≥ d₁d₃ ∨ d₂d₃ and $N \geq 2 (d_{1} \land d_{2}) ϱ^{2} \log^{2} (ϱ \sqrt{d_{1} \land d_{2}}) \log (d_{1} d_{3} + d_{2} d_{3})$ , then it holds with probability at least $1 - {(d_{1} d_{3} + d_{2} d_{3})}^{- 1}$ that

\begin{aligned} ‖ X^{*} (ξ) ‖ \leq C_{ϱ} \sqrt{N {(d_{1} \land d_{2})}^{- 1} \log (d_{1} d_{3} + d_{2} d_{3})} . \end{aligned} (46)

• For k = 1, 2, 3, let X^∗(ξ) (_k) be the mode-k unfolding of random tensor $X^{*} (ξ)$ . Then, Lemma 5 in [30] indicates that letting N ≥ d_k ∨ (d_\k) and $N \geq 2 (d_{k} \land (d_{\ k})) ϱ^{2} \log^{2} (ϱ \sqrt{d_{k} \land d_{\ k}}) \log (d_{k} + d_{\ k})$ , then it holds with probability at least $1 - {(d_{k} + d_{\ k})}^{- 1}$ that

\begin{aligned} ‖ X^{*} {(ξ)}_{(k)} ‖ \leq C_{ϱ}^{'} \sqrt{N {(d_{k} \land d_{\ k})}^{- 1} \log (d_{k} + d_{\ k})} . \end{aligned} (47)

Then, combining Eq. 46 and 47 and using union bound, Eq. 45 can be obtained.

Lemma 8. Under the sample complexity of noisy tensor completion in Eq. 44, it holds that

\begin{aligned} E [‖ X^{*} (ϵ) ‖_{t 2 nn}^{*}] & \leq C \sqrt{N} (\frac{1}{γ} \sqrt{\frac{\log (d_{1} d_{3} + d_{2} d_{3})}{d_{1} \land d_{2}}} + \frac{1}{(1 - γ)} \\ \times \sum_{k = 1}^{3} \frac{1}{α_{k}} \sqrt{\frac{\log (d_{k} + d_{\ k})}{d_{k} \land d_{\ k}}}) . \end{aligned} (48)

Proof of Lemma 8. Similar to the proof of Lemma 7, the proof can be straightforwardly obtained by adopting the upper bound of the dual T2NN norm in Lemma 2 and Lemma 6 in the supplementary material of [25], and Lemma 6 in [30].

• First, Lemma 6 in the supplementary material of [25] shows that letting N ≥ d₁d₃ ∨ d₂d₃ and N ≥ 2 (d₁ ∧ d₂) log² (d₁ ∧ d₂) log (d₁d₃ + d₂d₃), then, the following inequality holds:

\begin{aligned} E [‖ X^{*} (ϵ) ‖] \leq C_{2} \sqrt{N {(d_{1} \land d_{2})}^{- 1} \log (d_{1} d_{3} + d_{2} d_{3})} . \end{aligned} (49)

• For k = 1, 2, 3, let X∗(ϵ) (_k) be the mode-k unfolding of random tensor $X^{*} (ϵ)$ . Then, Lemma 6 in [30] indicates that letting N ≥ d_k ∨ d_\k and N ≥ 2 (d_k ∧ d_\k) log² (d_k ∧ d_\,k) log (d_k + d_\k), then, the following inequality holds:

\begin{aligned} E [‖ X^{*} {(ϵ)}_{(k)} ‖] \leq C_{2}^{'} \sqrt{N {(d_{k} \land d_{\ k})}^{- 1} \log (d_{k} + d_{\ k})} . \end{aligned} (50)

Then, Eq. 48 can be obtained by combining Eqs. 49 and 50.Further combining Lemma 7, Lemma 8, and Proposition 2, we arrive at an upper bound on the estimation error in the follow theorem.

Theorem 3. Suppose Assumption 2 is satisfied and $‖ L^{*} ‖_{\infty} \leq a$ . Let the sample size N satisfies Eq. 44. By setting

\begin{aligned} λ & = C_{ϱ} σ \sqrt{N} (\frac{1}{γ} \sqrt{\frac{\log (d_{1} d_{3} + d_{2} d_{3})}{d_{1} \land d_{2}}} \\ + \frac{1}{(1 - γ)} \sum_{k = 1}^{3} \frac{1}{α_{k}} \sqrt{\frac{\log (d_{k} + d_{\ k})}{d_{k} \land d_{\ k}}}), \end{aligned} (51)

the estimation error of any estimator $\hat{L}$ defined in Problem (35) can be upper bounded as follows:

\begin{aligned} \frac{‖ \hat{L} - L^{*} ‖_{F}^{2}}{d_{1} d_{2} d_{3}} & \leq c_{2} \max \{a^{2} \sqrt{\frac{\log \tilde{d}}{N}}, \frac{d_{1} d_{2} d_{3} (σ^{2} \lor a^{2})}{N} \\ \times {(γ \sqrt{r^{*}} + (1 - γ) \sum_{k} α_{k} \sqrt{r_{k}^{*}})}^{2} \\ \cdot (\frac{1}{γ} \sqrt{\frac{\log (d_{1} d_{3} + d_{2} d_{3})}{d_{1} \land d_{2}}} + \frac{1}{(1 - γ)} \\ {\times \sum_{k = 1}^{3} \frac{1}{α_{k}} \sqrt{\frac{\log (d_{k} + d_{\ k})}{d_{k} \land d_{\ k}}})}^{2}\} . \end{aligned} (52)

with probability at least $1 - 3 {(d_{1} d_{3} + d_{2} d_{3})}^{- 1} - \sum_{k} {(d_{k} + d_{\ k})}^{- 1}$ .To understand the proposed bound in Theorem 3, we consider the three-way cubical tensor $L^{*} \in R^{d \times d \times d}$ with regularization weights γ = (1 − γ)α₁ = (1 − γ)α₂ = (1 − γ)α₃ = 1/4. Then, the bound in Eq. 52 is simplified to the following element-wise error:

\begin{aligned} \frac{‖ \hat{L} - L^{*} ‖_{F}^{2}}{d^{3}} \leq O (\frac{d^{3}}{N} \cdot {(σ \lor a)}^{2} \cdot {(\sqrt{\frac{r^{*}}{d}} + \sum_{k = 1}^{3} \sqrt{\frac{r_{k}^{*}}{d}})}^{2} \log d), \end{aligned} (53)

which means the estimation error is controlled by the tubal rank and Tucker rank of $L^{*}$ simultaneously. Equation 53 also indicates that the sample size N should satisfy

\begin{aligned} N \geq Ω ({(\sqrt{r^{*}} + \sum_{k = 1}^{3} \sqrt{r_{k}^{*}})}^{2} d^{2} \log d) \end{aligned} (54)

for approximate tensor completion.

5 Optimization Algorithm

The ADMM framework [33] is applied to solve the proposed model. Adding auxiliary variables $K$ and $T^{1}, T^{2}, T^{3}$ to Problem (15) yields an equivalent formulation,

\begin{aligned} \min_{L, K, {T^{k}}_{k}} & \frac{1}{2} ‖ y - X (L) ‖_{2}^{2} + λ γ ‖ K ‖_{t n n} + λ (1 - γ) \\ \times \sum_{k = 1}^{3} α_{k} ‖ T_{(k)}^{k} ‖_{*} \\ s.t. & K = L; T^{k} = L, k = 1,2,3 . \end{aligned} (55)

To solve Problem (55), an ADMM-based algorithm is proposed. First, the augmented Lagrangian is

\begin{aligned} L_{ρ} (L, K, {T^{k}}_{k}, A, {B^{k}}_{k}) = \frac{1}{2} ‖ y - X (L) ‖_{2}^{2} + λ γ ‖ K ‖_{t n n} \\ + ⟨ A, K - L ⟩ + \frac{ρ}{2} ‖ K - L ‖_{F}^{2} \\ + λ (1 - γ) \sum_{k = 1}^{3} (α_{k} ‖ T_{(k)}^{k} ‖_{*} + ⟨ B^{k}, T^{k} - L ⟩ + \frac{ρ}{2} ‖ T^{k} - L ‖_{F}^{2}), \end{aligned} (56)

where tensors $A$ and ${B^{k}}_{k}$ are the dual variables.

The primal variables $L, K$ , and $T^{k}$ can be divided into two blocks: The first block has one tensor variable $L$ , whereas the second block consists of four variables $K$ and $T^{k}$ ’s. We use the minimization scheme of ADMM to update the two blocks alternatively after the tth iteration (t = 0, 1, ⋯):

Update the first block $L$ : We update $L$ by solving following $L$ -subproblem with all the other variables fixed:

\begin{aligned} L^{t + 1} \\ = \underset{L}{argmin} L_{ρ} (L, K^{t}, {{(T^{k})}^{t}}_{k}, A^{t}, {{(B^{k})}^{t}}_{k}) \\ = \underset{L}{argmin} \frac{1}{2} ‖ y - X (L) ‖_{2}^{2} + \frac{ρ}{2} ‖ K^{t} + ρ^{- 1} A^{t} - L ‖_{F}^{2} \\ + \sum_{k = 1}^{3} \frac{ρ}{2} ‖ {(T^{k})}^{t} + ρ^{- 1} {(B^{k})}^{t} - L ‖_{F}^{2} . \end{aligned}

By taking derivative with respect to $L$ and setting the derivative to zero, we obtain the following equation:

\begin{aligned} X^{*} (X (L) - y) + ρ (L - K^{t} - ρ^{- 1} A^{t}) \\ + \sum_{k = 1}^{3} ρ (L - {(T^{k})}^{t} - ρ^{- 1} {(B^{k})}^{t}) = 0 . \end{aligned}

Solving the above equation yields

\begin{aligned} L^{t + 1} & = {(X^{*} X + 4 ρ I)}^{- 1} (X^{*} (y) + ρ K^{t} + A^{t} \\ + \sum_{k = 1}^{3} (ρ {(T^{k})}^{t} + {(B^{k})}^{t})), \end{aligned} (57)

where $I (\cdot)$ is the identity operator.

Update the second block $(K, {T^{k}})$ : We update $K$ and ${T^{k}}$ in parallel by keeping all the other variables fixed. First, $K$ is updated by solving the $K$ -subproblem,

\begin{aligned} K^{t + 1} & = \underset{K}{argmin} L_{ρ} (L^{t + 1}, K, {{(T^{k})}^{t}}_{k}, A^{t}, {{(B^{k})}^{t}}_{k}) \\ = \underset{K}{argmin} λ γ ‖ K ‖_{t n n} + ⟨ A^{t}, K - L^{t + 1} ⟩ + \frac{ρ}{2} ‖ K - L^{t + 1} ‖_{F}^{2} \\ = {P r o x}_{ρ^{- 1} λ γ}^{‖ \cdot ‖ tnn} (L^{t + 1} - ρ^{- 1} A^{t}), \end{aligned} (58)

where ${P r o x}_{τ}^{‖ \cdot ‖ tnn} (\cdot)$ is the proximal operator of TNN given in Lemma 9.

Then, $T^{k}$ is updated by solving the $T^{k}$ -subproblem (k = 1, 2, 3),

\begin{aligned} {(T^{k})}^{t + 1} = \underset{T^{k}}{argmin} L_{ρ} (L^{t + 1}, K^{t}, {(T^{k})}_{k}, A^{t}, {{(B^{k})}^{t}}_{k}) \\ = \underset{T^{k}}{argmin} λ α_{k} (1 - γ) ‖ T_{(k)}^{k} ‖_{*} + ⟨ {(B^{k})}^{t}, T^{k} - L^{t + 1} ⟩ + \frac{ρ}{2} ‖ T^{k} - L^{t + 1} ‖_{F}^{2} \\ = F_{k} ({P r o x}_{ρ^{- 1} λ α_{k} (1 - γ)}^{‖ \cdot ‖_{*}} (L_{(k)}^{t + 1} - ρ^{- 1} {(B_{(k)}^{k})}^{t})), \end{aligned} (59)

where $F_{k} (\cdot) : R^{d_{k} \times d_{\ k}} \to R^{d_{1} \times d_{2} \times d_{3}}$ is the folding function to reshape a mode-k matricazation to its original tensor format and ${P r o x}_{τ}^{‖ \cdot ‖_{*}} (\cdot)$ is the proximal operator of matrix nuclear norm given in Lemma 10.

Lemma 9 (proximal operator of TNN [34]). Let tensor $T_{0} \in R^{d_{1} \times d_{2} \times d_{3}}$ with t-SVD $T_{0} = U * S * V^{⊤}$ , where $U \in R^{d_{1} \times r \times d_{3}}$ and $V \in R^{d_{2} \times r \times d_{3}}$ are orthogonal tensors and $S \in R^{r \times r \times d_{3}}$ is the f-diagonal tensor of singular tubes. Then, the proximal operator of function ‖⋅‖_tnn at point $T_{0}$ with parameter τ can be computed as follows:

\begin{aligned} {P r o x}_{τ}^{‖ \cdot ‖ tnn} (T_{0}) & ≔ \underset{T}{argmin} \frac{1}{2} ‖ T_{0} - T ‖_{F}^{2} + τ ‖ T ‖_{t n n} \\ = U * {i f f t}_{3} (m a x ({f f t}_{3} (S) - τ, 0)) * V^{⊤}, \end{aligned}

where ${f f t}_{3} (\cdot)$ and ${i f f t}_{3} (\cdot)$ denote the operations of fast DFT and fast inverse DFT on all the tubes of a given tensor, respectively.

Lemma 10 (proximal operator of the matrix nuclear norm [35]). Let tensor $T_{0} \in R^{d_{1} \times d_{2}}$ with SVD T₀ = USV^⊤, where $U \in R^{d_{1} \times r}$ and $V \in R^{d_{2} \times r}$ are orthogonal matrices and $S \in R^{r \times r}$ is a diagonal matrix of singular values. Then, the proximal operator of function ‖⋅‖_∗ at point T₀ with parameter τ can be computed as follows:

\begin{aligned} {P r o x}_{τ}^{‖ \cdot ‖_{*}} (T_{0}) ≔ & \underset{T}{argmin} \frac{1}{2} ‖ T_{0} - T ‖_{F}^{2} + τ ‖ T ‖_{*} \\ = & U max (S - τ, 0) V^{⊤} . \end{aligned}

Update the dual variables $(A, {B^{k}})$ . We use dual ascending [33] to update $(A, {B^{k}})$ as follows:

\begin{aligned} A^{t + 1} & = A^{t} + ρ (K^{t + 1} - L^{t + 1}), \\ {(B^{k})}^{t + 1} & = {(B^{k})}^{t} + ρ ({(T^{k})}^{t + 1} - L^{t + 1}), k = 1,2,3 . \end{aligned} (60)

Termination Condition. Given a tolerance ϵ > 0, check the termination condition of primal variables

\begin{aligned} ‖ X^{t} - X^{t} ‖_{\infty} \leq ϵ, \forall X \in \{L, K, {T^{k}}\}, \end{aligned} (61)

and convergence of constraints

\begin{aligned} ‖ K^{t} - L^{t} ‖_{\infty} \leq ϵ, and ‖ {(T^{k})}^{t} - L^{t} ‖_{\infty} \leq ϵ, k = 1,2,3 . \end{aligned} (62)

The ADMM-based algorithm is described in Algorithm 1.

Algorithm 1. ADMM for Problem(55)

Computational complexity analysis: We analyze the computational complexity as follows.

• By precomputing ${(I + X^{*} X X^{*} X)}^{- 1}$ and $X^{*} X$ , which costs $O (d_{1}^{3} d_{2}^{3} d_{3}^{3} + N d_{1}^{2} d_{2}^{2} d_{3}^{2})$ , the cost of updating $L$ is $O (d_{1}^{2} d_{2}^{2} d_{3}^{2})$ .

• Updating $K$ and $T^{k}$ involves computing the proximal operator of TNN and NN, which costs $O (d_{1} d_{2} d_{3} (d_{1} \land d_{2} + \log d_{3} + \sum_{k = 1}^{3} d_{k} \land d_{\ k}))$ .

• Updating $A$ and ${B^{k}}$ (k = 1, 2, 3) costs O (d₁d₂d₃).

Overall, supposing the iteration number is T, the total computational complexity will be

\begin{aligned} O (d_{1}^{3} d_{2}^{3} d_{3}^{3} + T d_{1}^{2} d_{2}^{2} d_{3}^{2} + T d_{1} d_{2} d_{3} (d_{1} \land d_{2} + \log d_{3} & + \sum_{k = 1}^{3} d_{k} \land d_{\ k})), \end{aligned} (63)

which is very expensive for large tensors. In some special cases (like tensor completion) where $⟨ X_{i}, L ⟩$ operates on an element of $L$ , ${(I + X^{*} X X^{*} X)}^{- 1}$ and $X^{*} X$ can be computed in O (d₁d₂d₃). Hence, the total complexity of Algorithm 1 will drop to

\begin{aligned} O (T d_{1} d_{2} d_{3} (\min {d_{1}, d_{2}} + \log d_{3} + \sum_{k = 1}^{3} d_{k} \land d_{\ k})) . \end{aligned} (64)

Convergence analysis: We then discuss the convergence of Algorithm 1 as follows.

Theorem 4 (convergence of Algorithm 1). For any positive constant ρ, if the unaugmented Lagrangian function $L_{0} (L, K, {T^{k}}, A, {B^{k}})$ has a saddle point, then the iterations $L_{ρ} (L^{t}, K^{t}, {{(T^{k})}^{t}}, A^{t}, {{(B^{k})}^{t}})$ in Algorithm 1 satisfy the residual convergence, objective convergence, and dual variable convergence (defined in [33]) of Problem (55) as t → ∞.Proof of Theorem 4. The key idea is to rewrite Problem(55) into a standard two-block ADMM problem. For notational simplicity, let

\begin{aligned} f (u) & = \frac{1}{2} ‖ y - X (L) ‖_{2}^{2}, g (v) = λ γ ‖ K ‖_{t n n} + λ (1 - γ) \sum_{k = 1}^{3} α_{k} ‖ T_{(k)}^{k} ‖_{*}, \end{aligned}

with u, v, w, and A defined as follows:

\begin{aligned} u ≔ vec (L) \in R^{d_{1} d_{2} d_{3}}, & v ≔ [\begin{matrix} vec (K) \\ vec (T^{1}) \\ vec (T^{2}) \\ vec (T^{3}) \end{matrix}] \in R^{4 d_{1} d_{2} d_{3}}, \\ w ≔ [\begin{matrix} vec (A) \\ vec (B^{1}) \\ vec (B^{2}) \\ vec (B^{3}) \end{matrix}] \in R^{4 d_{1} d_{2} d_{3}}, & A ≔ [\begin{matrix} I_{D} \\ I_{D} \\ I_{D} \\ I_{D} \end{matrix}] \in R^{4 d_{1} d_{2} d_{3} \times d_{1} d_{2} d_{3}}, \end{aligned}

where vec (⋅) denotes the operation of tensor vectorization (see [18]).It can be verified that f (⋅) and g (⋅) are closed, proper convex functions. Then, Problem(55) can be re-written as follows:

\begin{aligned} \min_{u, v} & f (u) + g (v) \\ s.t. & A u - v = 0 . \end{aligned}

According to the convergence analysis in [33], we have

\begin{aligned} objective convergence: & \lim_{t \to \infty} f (u^{t}) + g (v^{t}) = f^{*} + g^{*}, \\ dual variable convergence: & \lim_{t \to \infty} w^{t} = w^{*}, \\ constraint convergence: & \lim_{t \to \infty} A u^{t} - v^{t} = 0, \end{aligned}

where f ^∗, g^∗ are the optimal values of f(u), g(v), respectively. Variable w^∗ is a dual optimal point defined as

\begin{aligned} w^{*} = [\begin{matrix} vec (A^{*}) \\ vec (B^{1 *}) \\ vec (B^{2 *}) \\ vec (B^{3 *}) \end{matrix}], \end{aligned}

where $(A^{*}, {B^{k *}}_{k})$ are the dual variables in a saddle point $(L^{*}, K^{*}, {{(T^{k})}^{*}}, A^{*}, {{(B^{k})}^{*}})$ of the unaugmented Lagrangian $L_{0} (L, K, {T^{k}}, A, {B^{k}})$ . Since there are only equality constraints in the convex problem(55), strong duality holds naturally as a corollary of Slater’s condition [23], which further indicates that the unaugmented Lagrangian $L_{0} (L, K, {T^{k}}, A, {B^{k}})$ has a saddle point. Moreover, according to the analysis in [36], the convergence rate of general ADMM-based algorithms is O (1/T), where T denotes the iteration number. In this way, the convergence behavior of Algorithm 1 is analyzed.

6 Experimental Results

In this section, we first conduct experiments on synthetic datasets to validate the theory for tensor compressed sensing and then evaluate the effectiveness of the proposed T2NN on three types of real data for noisy tensor completion. MATLAB implementations of the algorithms are deployed on a PC running UOS system with an AMD 3 GHz CPU and a RAM of 40 GB.

6.1 Tensor Compressed Sensing

Our theoretical results on tensor compressed sensing are validated on synthetic data in this subsection. Motivated by [7], we consider a constrained T2NN minimization model that is equivalent to Model (15) for the ease of parameter selection. For performance evaluation, the proposed T2NN is also compared with TNN-based tensor compressed sensing [37]. First, the underlying tensor $L^{*} \in R^{d_{1} \times d_{2} \times d_{3}}$ and its compressed observations {y_i} are synthesized by the following tow steps, respectively:

• Step 1: Generate $L^{*}$ that is low-rank in both spectral and original domains. Given positive integers d₁, d₂, d₃, and r ≤ min{d₁, d₂, d₃}, we first generate $T \in R^{d_{1} \times d_{2} \times r}$ by $T = G_{1} * G_{2}$ , where $G_{1} \in R^{d_{1} \times 1 \times r}$ and $G_{2} \in R^{1 \times d_{2} \times r}$ are tensors with i. i.d. standard Gaussian entries. Then, let $L^{*} = T \times_{3} G$ where ×₃ is the tensor mode-3 product [18], and $G \in R^{r \times d_{3}}$ is a matrix with i. i.d standard Gaussian entries. Our extensive numerical experimental results show that with high probability, the tubal rank and Tucker rank of $L^{*}$ are all equal to r, that is, ${r a n k}_{t b} (L^{*}) = r$ and $r a n k (L_{(k)}^{*}) = r, \forall k = 1,2,3$ .

• Step 2: Generate N compressed observations {y_i}. Given a positive integer N ≪ D, we first generate N design tensors ${X_{i}}$ with i. i.d. standard Gaussian entries. Then, N noise variables {ξ_i} are generated as i. i.d. standard Gaussian variables. The parameter of standard deviation σ is set by σ = cσ₀, where $σ_{0} = ‖ L^{*} ‖_{F} / \sqrt{d_{1} d_{2} d_{3}}$ , and we use c to denote the noise level. Finally, {y_i} are formed according to the observation model (13). The goal of tensor compressed sensing is to reconstruct the known $L^{*}$ from its noisy compressed observations {y_i}.

For simplicity, we consider cubic tensors, i.e., d₁ = d₂ = d₃ = d, and choose the parameter of T2NN by γ = 1/4, α₁ = α₂ = α₃ = 1/3. Recall that the underlying tensor $L^{*} \in R^{d \times d \times d}$ generated by the above Step 1 has the tubal rank and Tucker rank all equal to r with high probability. We consider tensors with dimensionality d ∈ {16, 20, 24} and rank proxy r ∈ {2, 3}. Then, if the proposed main theorem for tensor compressed sensing (i.e., Theorem 2) is correct, the following two phenomena should be observed:

(1) Phenomenon 1: In the noiseless setting, i.e., σ = 0, if the observation number N is larger than C₀rd² for a sufficiently large constant C₀, then the estimation error $‖ \hat{L} - L^{*} ‖_{F}^{2}$ can be zero, which means exact recovery. Let N₀ = rd² as a unit measure of the sample complexity. Then, by increasing the observation number N gradually from 0, we will observe a phase transition point of the estimation error in the noiseless setting: If N/N₀ > C₀, the estimation error is relatively “large”; once N/N₀ ≤ C₀, the error will drop dramatically to 0.

(2) Phenomenon 2: In the noisy case, the estimation error $‖ \hat{L} - L^{*} ‖_{F}^{2}$ scales linearly with the variance σ² of the random noises once the observation number N ≥ C₀N₀.

To check whether Phenomenon 1 occurs, we conduct tensor compressed sensing by setting the noise variance σ² = 0. We gradually increase the normalized observation number N/N₀ from 0.25 to 5. For each different setting of d, r, and N/N₀, we repeat the experiments 10 times and report the averaged estimation error $‖ \hat{L} - L^{*} ‖_{F}^{2}$ . For both TNN [37] and the proposed T2NN, we plot the curves of estimation error in logarithm versus the normalized observation number N/N₀ for $L^{*} \in R^{d \times d \times d}$ with rank proxy r = 2 in Figure 1. It can be seen that Phenomenon 1 occurs for the proposed T2NN: When N/N₀ > 1.75, the estimation error is relatively “large”; once N/N₀ ≤ 1.75, the error will drop dramatically to 0. The same phenomenon also occurs for TNN with a phase transition point near 3.5. Thus, the sample complexity for exact tensor compressed sensing of T2NN is lower than that of TNN, indicating the superiority of the proposed T2NN. Since similar phenomena have also been observed for tensors of other sizes and rank proxies, we simply omit them.

FIGURE 1

FIGURE 1. Estimation error in logarithm vs. the normalized observation number N/N₀ for tensor compressed sensing of underlying tensors of size 16×16×16 and rank proxy r =2. The proposed T2NN is compared with TNN [37].

For the validation of Phenomenon 2, we consider the noisy settings with normalized sample complexity N/N₀ = 3.5, which is nearly the phase transition point of TNN and much greater than that of T2NN. We gradually increase the noise level c = σ/σ₀ from 0.025 to 0.25. For each different setting of d, r, and c, we repeat the experiments 10 times and report the averaged estimation error $‖ \hat{L} - L^{*} ‖_{F}^{2}$ . For both TNN [37] and the proposed T2NN, we plot the curves of estimation error in logarithm versus the (squared) noise level $σ^{2} / σ_{0}^{2}$ for $L^{*} \in R^{d \times d \times d}$ with rank proxy r = 2 in Figure 2. It can be seen that Phenomenon 2 also occurs for the proposed T2NN: The estimation error scales approximately linearly with the (squared) noise level. The same phenomenon can also be observed for TNN with a higher estimation error than T2NN, indicating T2NN is more accurate than TNN. We omit the results for tensors of other sizes and rank proxies because the error curves are so similar to Figure 2.

FIGURE 2

FIGURE 2. Estimation error vs. the (squared) noise level $σ^{2} / σ_{0}^{2}$ for tensor compressed sensing of underlying tensors of size 16×16×16 and rank proxy r =2. The proposed T2NN is compared with TNN [37].

6.2 Noisy Tensor Completion

This subsection evaluates effectiveness of the proposed T2NN through performance comparison with matrix nuclear norms (NN) [30], SNN [22], and TNN [25] by carrying out noisy tensor completion on three different types of visual data including video data, hyperspectral images, and seismic data.

6.2.1 Experimental Settings

Given the tensor data $L^{*} \in R^{d_{1} \times d_{2} \times d_{3}}$ , the goal is to recover it from its partial noisy observations. We consider uniform sampling with ratio p ∈ {0.05, 0.1, 0.15} for the tensors, that is, {95, 90, 85%} entries of a tensor are missing. The noise follows i. i.d. Gaussian $N (0, σ^{2})$ where σ = 0.05σ₀, where $σ_{0} = ‖ L^{*} ‖_{F} / \sqrt{d_{1} d_{2} d_{3}}$ is the rescaled magnitude of tensor $L^{*} \in R^{d_{1} \times d_{2} \times d_{3}}$ .

6.2.2 Performance evaluation

The effectiveness of algorithms is measured by the Peak Signal Noise Ratio (PSNR) and structural similarity (SSIM) [38]. Specifically, the PSNR of an estimator $\hat{L}$ is defined as

\begin{aligned} P S N R ≔ 10 \log_{10} (\frac{d_{1} d_{2} d_{3} ‖ L^{*} ‖_{\infty}^{2}}{‖ \hat{L} - L^{*} ‖_{F}^{2}}), \end{aligned}

for the underlying tensor $L^{*} \in R^{d_{1} \times d_{2} \times d_{3}}$ . The SSIM is computed via

\begin{aligned} S S I M ≔ \frac{(2 μ_{L^{*}} μ_{\hat{L}} + {(0.01 \bar{ω})}^{2}) (2 σ_{L^{*}, \hat{L}} + {(0.03 \bar{ω})}^{2})}{(μ_{L^{*}}^{2} + μ_{\hat{L}}^{2} + {(0.01 \bar{ω})}^{2}) (σ_{L^{*}}^{2} + σ_{\hat{L}}^{2} + {(0.03 \bar{ω})}^{2})}, \end{aligned}

where $μ_{L^{*}}, μ_{\hat{L}}, σ_{L^{*}}, σ_{\hat{L}}, σ_{L^{*}, \hat{L}}$ , and $\bar{ω}$ denote the local means, standard deviation, cross-covariance, and dynamic range of the magnitude of tensors $L^{*}$ and $\hat{L}$ . Larger PSNR and SSIM values indicate the higher quality of the estimator $\hat{L}$ . In each setting, we test each tensor for 10 trials and report the averaged PSNR (in db) and SSIM values.

6.2.3 Parameter Setting

For NN [30], we set the parameter $λ = λ_{ι} σ \sqrt{p (d_{1} \lor d_{2}) \log (d_{1} + d_{2})}$ . For SNN [22], we set the regularization parameter λ = λ_ι and chose the weight α by α₁: α₂: α₃ = 1 : 1: 1. For TNN [25], we set $λ = λ_{ι} σ \sqrt{p d_{3} (d_{1} \lor d_{2}) \log (d_{1} d_{3} + d_{2} d_{3})}$ . For the proposed T2NN, we set the regularization parameter $λ = λ_{ι} σ \sqrt{p d_{3} (d_{1} \lor d_{2}) \log (d_{1} d_{3} + d_{2} d_{3})}$ and choose the weights γ = 0.5 and α with α₁: α₂: α₃ = 1 : 1: 10. The factor λ_ι is then tuned in {10^–3, 10^–2, … , 10³} for each norm, and we chose the one with highest PSNRs in most cases in the parameter tuning phase.

6.2.4 Experiments on Video Data

We first conduct noisy video completion on four widely used YUV videos: Akiyo, Carphone, Grandma, and Mother-daughter. Owing to computational limitation, we simply use the first 30 frames of the Y components of all the videos and obtain four tensors of size 144 × 17 × 30. We first report the averaged PSNR and SSIM values obtained by four norms for quantitative comparison in Table 1 and then give visual examples in Figure 3 when 95% of the tensor entries are missing for qualitative evaluation. A demo of the source code is available at https://github.com/pingzaiwang/T2NN-demo.

TABLE 1

TABLE 1. PSNR and SSIM values obtained by four norms (NN [30], SNN [22], TNN [25], and our T2NN) for noisy tensor completion on the YUV videos.

FIGURE 3

FIGURE 3. Visual results obtained by four norms for noisy tensor completion with 95% missing entries on the YUV-video dataset. The first to fourth rows correspond to the video of Akiyo, Carphone, Grandman, and Mother-duaghter, respectively. The sub-plots from (A) to (F): (A) a frame of the original video, (B) the observed frame, (C) the frame recovered by NN [30], (D) the frame recovered by SNN [22], (E) the frame recovered by the vanilla TNN [25], and (F) the frame recovered by our T2NN.

6.2.5 Experiments on Hyperspectral Data

We then carry out noisy tensor completion on subsets of the two representative hyperspectral datasets described as follows:

• Indian Pines: The dataset was collected by AVIRIS sensor in 1992 over the Indian Pines test site in North-western Indiana and consists of 145 × 145 pixels and 224 spectral reflectance bands. We use the first 30 bands in the experiments due to the trade-off between the limitation of computing resources.

• Salinas A: The data were acquired by AVIRIS sensor over the Salinas Valley, California in 1998, and consists of 224 bands over a spectrum range of 400–2500 nm. This dataset has a spatial extent of 86 × 83 pixels with a resolution of 3.7 m. We use the first 30 bands in the experiments too.

The averaged PSNR and SSIM values are given in Table 2 for quantitative comparison. We also show visual examples in Figure 4 when 85% of the tensor entries are missing for qualitative evaluation.

TABLE 2

TABLE 2. PSNR and SSIM values obtained by four norms (NN [30], SNN [22], TNN [25], and our T2NN) for noisy tensor completion on the hyperspectral datasets.

FIGURE 4

FIGURE 4. Visual results obtained by four norms for noisy tensor completion with 85% missing entries on the hyperspectral dataset (gray data shown with pseudo-color). The first and second rows correspond to Indian Pines and Salinas A, respectively. The sub-plots from (A) to (F): (A) a frame of the original data, (B) the observed frame, (C) the frame recovered by NN [30], (D) the frame recovered by SNN [22], (E) the frame recovered by the vanilla TNN [25], and (F) the frame recovered by our T2NN.

6.2.6 Experiments on Seismic Data

We use the seismic data tensor of size 512 × 512 × 3, which is abstracted from the test data “seismic.mat” of a toolbox for seismic data processing from Center of Geopyhsics, Harbin Institute of Technology, China. For quantitative comparison, we present the PSNR and SSIM values for two sampling schemes in Table 3.

TABLE 3

TABLE 3. PSNR and SSIM values obtained by four norms (NN [30], SNN [22], TNN [25], and our T2NN) for noisy tensor completion on the Seismic dataset.

6.2.7 Summary and Analysis of Experimental Results

According to the experimental results on three types of real tensor data shown in Table 1, Table 2, Table 3, and Figure 3, the summary and analysis are presented as follows:

1) In all the cases, tensor norms (SNN, TNN, and T2NN) perform better than the matrix norm (NN). It can be explained that tensor norms can honestly preserve the multi-way structure of tensor data such that the rich inter-modal and intra-modal correlations of the data can be exploited to impute the missing values, whereas the matrix norm can only handle two-way structure and thus fails to model the multi-way structural correlations of the tensor data.

2) In most cases, TNN outperforms SNN, which is in consistence with the results reported in [14, 17, 25]. One explanation is that the used video, hyperspectral images, and seismic data all possess stronger low-rankness in the spectral domain (than in the original domain), which can be successfully captured by TNN.

3) In most cases, the proposed T2NN performs best among the four norms. We owe the promising performance to the capability of T2NN in simultaneously exploiting low-rankness in both spectral and original domains.

7 Conclusion and Discussions

7.1 Conclusion

Due to its definition solely in the spectral domain, the popular TNN may be incapable to exploit low-rankness in the original domain. To remedy this weaknesses, a hybrid tensor norm named the “Tubal + Tucker” Nuclear Norm (T2NN) was first defined as the weighted sum of TNN and SNN to model both spectral and original domain low-rankness. It was further used to formulate a penalized least squares estimator for tensor recovery from noisy linear observations. Upper bounds on the estimation error were established in both deterministic and non-asymptotic senses to analyze the statistical performance of the proposed estimator. An ADMM-based algorithm was also developed to efficiently compute the estimator. The effectiveness of the proposed model was demonstrated through experimental results on both synthetic and real datasets.

7.2 Limitations of the Proposed Model and Possible Solutions

Generally speaking, the proposed estimator has the following two drawbacks due to the adoption of T2NN:

• Sample inefficiency: The analysis of [24, 28] indicates that for tensor recovery from a small number of observations, T2NN cannot provide essentially lower sample complexity than TNN.

• Computational inefficiency: Compared to TNN, T2NN is more time-consuming since it involves computing both TNN and SNN.

We list several directions that this work can be extended to overcome the above drawbacks.

• For sample inefficiency: First, inspired by the attempt of adopting the “best” norm (e.g., Eq. 8 in [28]), the following model can be considered:

\begin{aligned} \min_{L} & \max \{\frac{‖ L ‖_{t n n}}{‖ L^{*} ‖_{t n n}}, \max_{k = 1,2,3} \frac{‖ L_{(k)} ‖_{*}}{‖ L_{(k)}^{*} ‖_{*}}\} \\ s.t. ‖ y - X (L) ‖_{2} \leq ϵ \end{aligned} (65)

for a certain noise level ϵ ≥ 0. Although Model (65) has a significantly higher accuracy and lower sample complexity according to the analysis in [28], it is impractical because it requires $‖ L^{*} ‖_{t n n}$ and $‖ L_{(k)}^{*} ‖_{*}$ (k = 1, 2, 3), which are unknown in advance. Motivated by [39], a more practical model is given as follows:

\begin{aligned} \min_{L} & \sum_{k = 1}^{3} exp (α_{k} ‖ L_{(k)} ‖_{*}) + exp (β ‖ L ‖_{t n n}) \\ s.t. ‖ y - X (L) ‖_{2} \leq ϵ, \end{aligned}

where β > 0 is a regularization parameter.

• For computational inefficiency: To improve the efficiency of the proposed T2NN-based models, we can use more efficient solvers of Problem (15) by adopting the factorization strategy [40, 41] or sampling-based approaches [42].

7.3 Extensions to the Proposed Model

In this subsection, we discuss possible extensions of the proposed model to general K-order (K > 3) tensors, general spectral domains, robust tensor recovery, and multi-view learning, respectively.

• Extensions to K-order (K > 3) tensors: Currently, the proposed T2NN is defined solely for 3-order tensors, and it cannot be directly applied to tensors of more than 3 orders like color videos. For general K-order tensors, it is suggested to replace the tubal nuclear norm in the definition of T2NN with orientation invariant tubal nuclear norm [5], which is defined to exploit multi-orientational spectral low-rankness for general higher-order tensors.

• Extensions to general spectral and original domains: This paper considers the DFT-based tensor product for spectral low-rank modeling. Recently, the DFT based t-product has been generalized to the *_L-product defined via any invertible linear transform [43], under which the tubal nuclear norm is also extended to *_L-tubal nuclear norm [44] and *_L-Spectral k-support norm [7]. It is natural to generalize the proposed T2NN by changing the tubal nuclear norm to *_L-tubal nuclear norm or *_L-Spectral k-support norm for further extensions. It is also interesting to consider other tensor decompositions for original domain low-rankness modeling such as CP, TT, and TR as future work.

• Extensions to robust tensor recovery: In many real applications, the tensor signal may also be corrupted by gross sparse outliers. Motivated by [5], the proposed T2NN can also be used in resisting sparse outliers for robust tensor recovery as follows:

\begin{aligned} \min_{L, S} \frac{1}{2} ‖ y - X (L + S) ‖_{2} + λ ‖ L ‖_{t 2 nn} + μ ‖ S ‖_{1}, \end{aligned}

where $S \in R^{d_{1} \times d_{2} \times d_{3}}$ denotes the tensor of sparse outliers, the tensor l₁-norm ‖⋅‖₁ is applied to encourage sparsity in $S$ , and μ > 0 is a regularization parameter.

• Extensions to multi-view learning: Due to its superiority in modeling multi-linear correlations of multi-modal data, TNN has been successfully applied to multi-view self-representations for clustering [45, 46]. Our proposed T2NN can also be utilized for clustering by straightforwardly replacing TNN in the formulation of multi-view learning models (e.g., Eq. 9 in [45]).

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here; https://sites.google.com/site/subudhibadri/fewhelpfuldownloads, https://engineering.purdue.edu/∼biehl/MultiSpec/hyperspectral.html, https://rslab.ut.ac.ir/documents/81960329/82035173/SalinasA_corrected.mat, https://github.com/sevenysw/MathGeo2018.

Author Contributions

Conceptualization and methodology—YL and AW; software—AW; formal analysis—YL, AW, GZ, and QZ; resources—YL, GZ, and QZ; writing: original draft preparation—YL, AW, GZ, and QZ; project administration and supervision—GZ, and QZ; and funding acquisition—AW, GZ, and QZ. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grants 61872188, 62073087, 62071132, 62103110, 61903095, U191140003, and 61973090, in part by the China Postdoctoral Science Foundation under Grant 2020M672536, and in part by the Natural Science Foundation of Guangdong Province under Grants 2020A1515010671, 2019B010154002, and 2019B010118001.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

AW is grateful to Prof. Zhong Jin in Nanjing University of Science and Technology for his long-time and generous support in both research and life. In addition, he would like to thank the Jin family in Zhuzhou for their kind understanding in finishing the project of tensor learning in these years.

Footnotes

¹The Fourier version $\tilde{T}$ is obtained by performing 1D-DFT on all tubes of $T$ , i.e., $\tilde{T} = fft (T, [], 3) \in C^{d_{1} \times d_{2} \times d_{3}}$ in MATLAB.

References

1. Guo C, Modi K, Poletti D. Tensor-Network-Based Machine Learning of Non-Markovian Quantum Processes. Phys Rev A (2020) 102:062414.

CrossRef Full Text | Google Scholar

2. Ma X, Zhang P, Zhang S, Duan N, Hou Y, Zhou M, et al. A Tensorized Transformer for Language Modeling. Adv Neural Inf Process Syst (2019) 32.

Google Scholar

3. Meng Y-M, Zhang J, Zhang P, Gao C, Ran S-J. Residual Matrix Product State for Machine Learning. arXiv preprint arXiv:2012.11841 (2020).

Google Scholar

4. Ran S-J, Sun Z-Z, Fei S-M, Su G, Lewenstein M. Tensor Network Compressed Sensing with Unsupervised Machine Learning. Phys Rev Res (2020) 2:033293. doi:10.1103/physrevresearch.2.033293

CrossRef Full Text | Google Scholar

5. Wang A, Zhao Q, Jin Z, Li C, Zhou G. Robust Tensor Decomposition via Orientation Invariant Tubal Nuclear Norms. Sci China Technol Sci (2022) 34:6102. doi:10.1007/s11431-021-1976-2

CrossRef Full Text | Google Scholar

6. Zhang X, Ng MK-P. Low Rank Tensor Completion with Poisson Observations. IEEE Trans Pattern Anal Machine Intelligence (2021). doi:10.1109/tpami.2021.3059299

CrossRef Full Text | Google Scholar

7. Wang A, Zhou G, Jin Z, Zhao Q. Tensor Recovery via *_L-Spectral k-Support Norm. IEEE J Sel Top Signal Process (2021) 15:522–34. doi:10.1109/jstsp.2021.3058763

CrossRef Full Text | Google Scholar

8. Cui C, Zhang Z. High-Dimensional Uncertainty Quantification of Electronic and Photonic Ic with Non-Gaussian Correlated Process Variations. IEEE Trans Computer-Aided Des Integrated Circuits Syst (2019) 39:1649–61. doi:10.1109/TCAD.2019.2925340

CrossRef Full Text | Google Scholar

9. Liu X-Y, Aeron S, Aggarwal V, Wang X. Low-Tubal-Rank Tensor Completion Using Alternating Minimization. IEEE Trans Inform Theor (2020) 66:1714–37. doi:10.1109/tit.2019.2959980

CrossRef Full Text | Google Scholar

10. Carroll JD, Chang J-J. Analysis of Individual Differences in Multidimensional Scaling via an N-Way Generalization of “Eckart-Young” Decomposition. Psychometrika (1970) 35:283–319. doi:10.1007/bf02310791

CrossRef Full Text | Google Scholar

11. Tucker LR. Some Mathematical Notes on Three-Mode Factor Analysis. Psychometrika (1966) 31:279–311. doi:10.1007/bf02289464

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Oseledets IV. Tensor-Train Decomposition. SIAM J Sci Comput (2011) 33:2295–317. doi:10.1137/090752286

CrossRef Full Text | Google Scholar

13. Zhao Q, Zhou G, Xie S, Zhang L, Cichocki A. Tensor Ring Decomposition. arXiv preprint arXiv:1606.05535 (2016).

Google Scholar

14. Zhang Z, Ely G, Aeron S, Hao N, Kilmer M. Novel Methods for Multilinear Data Completion and De-Noising Based on Tensor-Svd. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014). p. 3842–9. doi:10.1109/cvpr.2014.485

CrossRef Full Text | Google Scholar

15. Kilmer ME, Braman K, Hao N, Hoover RC. Third-Order Tensors as Operators on Matrices: A Theoretical and Computational Framework with Applications in Imaging. SIAM J Matrix Anal Appl (2013) 34:148–72. doi:10.1137/110837711

CrossRef Full Text | Google Scholar

16. Hou J, Zhang F, Qiu H, Wang J, Wang Y, Meng D. Robust Low-Tubal-Rank Tensor Recovery from Binary Measurements. IEEE Trans Pattern Anal Machine Intelligence (2021). doi:10.1109/tpami.2021.3063527

CrossRef Full Text | Google Scholar

17. Lu C, Feng J, Chen Y, Liu W, Lin Z, Yan S. Tensor Robust Principal Component Analysis with a New Tensor Nuclear Norm. IEEE Trans Pattern Anal Mach Intell (2020) 42:925–38. doi:10.1109/tpami.2019.2891760

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Kolda TG, Bader BW. Tensor Decompositions and Applications. SIAM Rev (2009) 51:455–500. doi:10.1137/07070111x

CrossRef Full Text | Google Scholar

19. Li X, Wang A, Lu J, Tang Z. Statistical Performance of Convex Low-Rank and Sparse Tensor Recovery. Pattern Recognition (2019) 93:193–203. doi:10.1016/j.patcog.2019.03.014

CrossRef Full Text | Google Scholar

20. Liu J, Musialski P, Wonka P, Ye J. Tensor Completion for Estimating Missing Values in Visual Data. IEEE Trans Pattern Anal Mach Intell (2013) 35:208–20. doi:10.1109/tpami.2012.39

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Qiu Y, Zhou G, Chen X, Zhang D, Zhao X, Zhao Q. Semi-Supervised Non-Negative Tucker Decomposition for Tensor Data Representation. Sci China Technol Sci (2021) 64:1881–92. doi:10.1007/s11431-020-1824-4

CrossRef Full Text | Google Scholar

22. Tomioka R, Suzuki T, Hayashi K, Kashima H. Statistical Performance of Convex Tensor Decomposition. In: Proceedings of Annual Conference on Neural Information Processing Systems (2011). p. 972–80.

Google Scholar

23. Boyd S, Boyd SP, Vandenberghe L. Convex Optimization. Cambridge: Cambridge University Press (2004).

Google Scholar

24. Mu C, Huang B, Wright J, Goldfarb D. Square Deal: Lower Bounds and Improved Relaxations for Tensor Recovery. In: International Conference on Machine Learning (2014). p. 73–81.

Google Scholar

25. Wang A, Lai Z, Jin Z. Noisy Low-Tubal-Rank Tensor Completion. Neurocomputing (2019) 330:267–79. doi:10.1016/j.neucom.2018.11.012

CrossRef Full Text | Google Scholar

26. Zhou P, Lu C, Lin Z, Zhang C. Tensor Factorization for Low-Rank Tensor Completion. IEEE Trans Image Process (2018) 27:1152–63. doi:10.1109/tip.2017.2762595

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Negahban S, Wainwright MJ. Estimation of (Near) Low-Rank Matrices with Noise and High-Dimensional Scaling. Ann Stat (2011) 2011:1069–97. doi:10.1214/10-aos850

CrossRef Full Text | Google Scholar

28. Oymak S, Jalali A, Fazel M, Eldar YC, Hassibi B. Simultaneously Structured Models with Application to Sparse and Low-Rank Matrices. IEEE Trans Inform Theor (2015) 61:2886–908. doi:10.1109/tit.2015.2401574

CrossRef Full Text | Google Scholar

29. Foucart S, Rauhut H. A Mathematical Introduction to Compressive Sensing, Vol. 1. Basel, Switzerland: Birkhäuser Basel (2013).

Google Scholar

30. Klopp O. Noisy Low-Rank Matrix Completion with General Sampling Distribution. Bernoulli (2014) 20:282–303. doi:10.3150/12-bej486

CrossRef Full Text | Google Scholar

31. Klopp O. Matrix Completion by Singular Value Thresholding: Sharp Bounds. Electron J Stat (2015) 9:2348–69. doi:10.1214/15-ejs1076

CrossRef Full Text | Google Scholar

32. Vershynin R. High-Dimensional Probability: An Introduction with Applications in Data Science, Vol. 47. Cambridge: Cambridge University Press (2018).

Google Scholar

33. Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations Trends® Machine Learn (2011) 3:1–122. doi:10.1561/2200000016

CrossRef Full Text | Google Scholar

34. Wang A, Wei D, Wang B, Jin Z. Noisy Low-Tubal-Rank Tensor Completion Through Iterative Singular Tube Thresholding. IEEE Access (2018) 6:35112–28. doi:10.1109/access.2018.2850324

CrossRef Full Text | Google Scholar

35. Cai J-F, Candès EJ, Shen Z. A Singular Value Thresholding Algorithm for Matrix Completion. SIAM J Optim (2010) 20:1956–82. doi:10.1137/080738970

CrossRef Full Text | Google Scholar

36. He B, Yuan X. On the $O(1/n)$ Convergence Rate of the Douglas-Rachford Alternating Direction Method. SIAM J Numer Anal (2012) 50:700–9. doi:10.1137/110836936

CrossRef Full Text | Google Scholar

37. Lu C, Feng J, Lin Z, Yan S. Exact Low Tubal Rank Tensor Recovery from Gaussian Measurements. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (2018). p. 1948–54. doi:10.24963/ijcai.2018/347

CrossRef Full Text | Google Scholar

38. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image Quality Assessment: from Error Visibility to Structural Similarity. IEEE Trans Image Process (2004) 13:600–12. doi:10.1109/tip.2003.819861

PubMed Abstract | CrossRef Full Text | Google Scholar

39. Zhang X, Zhou Z, Wang D, Ma Y. Hybrid Singular Value Thresholding for Tensor Completion. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014). p. 1362–8.

CrossRef Full Text | Google Scholar

40. Wang A-D, Jin Z, Yang J-Y. A Faster Tensor Robust Pca via Tensor Factorization. Int J Mach Learn Cyber (2020) 11:2771–91. doi:10.1007/s13042-020-01150-2

CrossRef Full Text | Google Scholar

41. Liu G, Yan S. Active Subspace: Toward Scalable Low-Rank Learning. Neural Comput (2012) 24:3371–94. doi:10.1162/neco_a_00369

PubMed Abstract | CrossRef Full Text | Google Scholar

42. Wang L, Xie K, Semong T, Zhou H. Missing Data Recovery Based on Tensor-Cur Decomposition. IEEE Access (2017) PP:1.

Google Scholar

43. Kernfeld E, Kilmer M, Aeron S. Tensor-Tensor Products with Invertible Linear Transforms. Linear Algebra its Appl (2015) 485:545–70. doi:10.1016/j.laa.2015.07.021

CrossRef Full Text | Google Scholar

44. Lu C, Peng X, Wei Y. Low-Rank Tensor Completion with a New Tensor Nuclear Norm Induced by Invertible Linear Transforms. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019). p. 5996–6004. doi:10.1109/cvpr.2019.00615

CrossRef Full Text | Google Scholar

45. Lu G-F, Zhao J. Latent Multi-View Self-Representations for Clustering via the Tensor Nuclear Norm. Appl Intelligence (2021) 2021:1–13. doi:10.1007/s10489-021-02710-x

CrossRef Full Text | Google Scholar

46. Liu Y, Zhang X, Tang G, Wang D. Multi-View Subspace Clustering Based on Tensor Schatten-P Norm. In: 2019 IEEE International Conference on Big Data (Big Data). Los Angeles, CA, USA: IEEE (2019). p. 5048–55. doi:10.1109/bigdata47090.2019.9006347

CrossRef Full Text | Google Scholar

Keywords: tensor decomposition, tensor low-rankness, tensor SVD, tubal nuclear norm, tensor completion

Citation: Luo Y, Wang A, Zhou G and Zhao Q (2022) A Hybrid Norm for Guaranteed Tensor Recovery. Front. Phys. 10:885402. doi: 10.3389/fphy.2022.885402

Received: 28 February 2022; Accepted: 27 April 2022;
Published: 13 July 2022.

Edited by:

Peng Zhang, Tianjin University, China

Reviewed by:

Jingyao Hou, Southwest University, China
Yong Peng, Hangzhou Dianzi University, China
Jing Lou, Changzhou Institute of Mechatronic Technology, China
Guifu Lu, Anhui Polytechnic University, China

Copyright © 2022 Luo, Wang, Zhou and Zhao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Andong Wang, dy5hLmRAb3V0bG9vay5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.