A multimodal fusion method for Alzheimer’s disease based on DCT convolutional sparse representation

Zhang, Guo; Nie, Xixi; Liu, Bangtao; Yuan, Hong; Li, Jin; Sun, Weiwei; Huang, Shixin

doi:10.3389/fnins.2022.1100812

ORIGINAL RESEARCH article

Front. Neurosci. , 06 January 2023

Sec. Neural Technology

Volume 16 - 2022 | https://doi.org/10.3389/fnins.2022.1100812

This article is part of the Research Topic Neural Signals Acquisition and Intelligent Analysis View all 10 articles

A multimodal fusion method for Alzheimer’s disease based on DCT convolutional sparse representation

$\r\nGuo Zhang,&#x;$ Guo Zhang^1,2†

Xixi Nie^3†

Bangtao Liu²

Hong Yuan²

Jin Li²

Weiwei Sun^4*

Shixin Huang^1,5*

¹School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China
²School of Medical Information and Engineering, Southwest Medical University, Luzhou, China
³Chongqing Key Laboratory of Image Cognition, College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
⁴School of Optoelectronic Engineering, Chongqing University of Posts and Telecommunications, Chongqing, China
⁵Department of Scientific Research, The People’s Hospital of Yubei District of Chongqing City, Yubei, China

Introduction: The medical information contained in magnetic resonance imaging (MRI) and positron emission tomography (PET) has driven the development of intelligent diagnosis of Alzheimer’s disease (AD) and multimodal medical imaging. To solve the problems of severe energy loss, low contrast of fused images and spatial inconsistency in the traditional multimodal medical image fusion methods based on sparse representation. A multimodal fusion algorithm for Alzheimer’ s disease based on the discrete cosine transform (DCT) convolutional sparse representation is proposed.

Methods: The algorithm first performs a multi-scale DCT decomposition of the source medical images and uses the sub-images of different scales as training images, respectively. Different sparse coefficients are obtained by optimally solving the sub-dictionaries at different scales using alternating directional multiplication method (ADMM). Secondly, the coefficients of high-frequency and low-frequency subimages are inverse DCTed using an improved L1 parametric rule combined with improved spatial frequency novel sum-modified SF (NMSF) to obtain the final fused images.

Results and discussion: Through extensive experimental results, we show that our proposed method has good performance in contrast enhancement, texture and contour information retention.

1. Introduction

Alzheimer’s disease (AD) is a common neurodegenerative disease with concealed onset and incurable in the elderly. In clinical, AD is characterized by general dementia such as cognitive decline and memory loss (Dubois et al., 2021). Advanced multimodal neuroimaging techniques, such as magnetic resonance imaging (MRI) (Thung et al., 2014; Liu M. et al., 2016; Lian et al., 2018; Fan et al., 2019) and positron emission tomography (PET) (Chetelat et al., 2003; Liu M. et al., 2017) use different imaging mechanisms to reflect the location of organs or lesions from different angles, and can clearly show human tissue or metabolism and blood flow of organs. This technique has the complementarity and irreplaceability of medical information, which provides a good prospect for the early diagnosis of AD (Perrin et al., 2009; Mohammadi-Nejad et al., 2017; Wang et al., 2021).

Medical image fusion includes image decomposition, fusion rules, and image reconstruction. The main purpose of image decomposition is to extract the feature information from the image. The effectiveness of feature extraction determines the quality of fusion results. The current image fusion algorithms can be divided into three categories. The first kind of image fusion is based on wavelet and pyramid transform (Da Cunha et al., 2006; Yang et al., 2010; Miao et al., 2011; Liu S. et al., 2017; Liu X. et al., 2017). Among them, the Laplace pyramid transform has the best robustness in the sampling operator. Wang and Shang (2020) proposed a fast image fusion method based on discrete cosine transform (DCT), which decomposes each source image into a base layer and a detail layer for image fusion. And optimize the calculation method of the base layer to better preserve the structure of the image. In addition, non-subsampled shear wave transforms (NSST) (Kong et al., 2014) are also widely used in AD diagnosis because of their translation invariance and multidirectional. The second kind of image fusion is based on edge-preserving filtering (Farbman et al., 2008; Xu et al., 2011; He et al., 2012; Hu and Li, 2012; Zhang et al., 2014; Kou et al., 2015). This method can filter the image while erasing the details and retaining its strong edge structure. It can decompose the input image into smooth layers and detail layers. The smooth layer contains the main energy information of the image; the detail layer contains texture features. The third type is the feature selection method based on sparse learning, for example, the multiplier alternating directional multiplication method (ADMM) algorithm (Liu and Yan, 2011) organizes the whole learning and decomposition process into vectors, and iterates with a sliding window to achieve the convergence effect.

Sparse representation (SR) is a widely used image representation theory. It deals with the natural sparsity of signals according to the physiological characteristics of the human visual system. It is widely used in image classification (He et al., 2019), image recognition (Liu H. et al., 2016), image feature extraction (Liu et al., 2014), and multimodal image fusion (Zhu et al., 2016). The fusion method based on SR and dictionary learning is widely used in image fusion proposed by compressed sensing theory (Donoho, 2006), and it is generally better than most traditional fusion methods (Zhang and Patel, 2017). It usually represents the source image in the form of a linear combination of overcomplete dictionaries and sparse coefficients. Because the weighted coefficients obtained are sparse, the significant information of the source image can be represented by a small number of non-zero elements in the sparse coefficients. In the methods based on SR, sparse coding is usually based on local image blocks. Yang and Li (2009) first introduces SR into image fusion. This method uses sliding window technology to make the fusion process robust to noise and registration errors. Because the adjacent image blocks overlap each other, the result of each pixel is multi-valued. Ideally, multiple values of each pixel should be equal to maintain the consistency of overlapping image blocks (Gu et al., 2015). However, sparse coding is performed independently on each image block. The correlation between image pixels is ignored, resulting in multiple unequal values for each pixel. At the same time, most fusion methods adopt the strategy of aggregation and averaging to obtain the final value of each pixel, which will cause the image details to be smoothed or even lost in fusion (Rong et al., 2017). Yin et al. (2016) obtained a joint dictionary by using the source image as training data and then fused the image using the maximum weighted multi-norm fusion rule. But the problem of missing details still exists. Zong and Qiu (2017) proposed a fusion method based on classified image blocks, which uses directional gradient histogram (HOG) features to classify image blocks to establish a sub-dictionary. Although the problem of loss of details has been reduced, it still inevitably leads to some details being smoothed. Zhang and Levine (2016) proposed an improved fusion method of multitasking robust SR combined with spatial context information. Like most methods based on SR, this method encodes for local image blocks rather than for the whole image. As a result, it can still lead to poor preservation of details. And usually, the fusion methods based on sparse coding use only one dictionary to represent the different morphological parts of the source image, which is easy to cause the loss of image information.

Therefore, we propose a multimodal fusion method for Alzheimer’s disease based on DCT convolution SR to solve the above problems. It was evaluated on the neuroimaging database of Alzheimer’s disease (ADNI) (Veitch et al., 2022), and its effectiveness was verified by experiments.

The contribution of this paper has the following three aspects:

1. An improved multiscale decomposition method of DCT is proposed. Firstly, the M × N size image is divided into blocks of 8 × 8 size, and then the DCT transform is applied to each small block separately. The DCT coefficients of each image block are normalized separately and their low-order DCT coefficients are calculated. The ratio of the energy of the higher-order DCT coefficients to the energy of the lower-order DCT coefficients is used as the focus evaluation function. To solve the problem of fused image capability loss and contrast reduction.

2. A convolutional SR method is proposed to solve the problem of spatial inconsistency of multimodal image fusion by combining the high-frequency and low-frequency components obtained from the multiscale decomposition and adopting the improved rules of spatial frequency and L₁ parametric combination according to the characteristics of AD multimodal images.

3. To address the problem of limited detail preservation capability of medical image fusion methods based on SR and the lack of expression capability of single dictionary, the detail texture and contour of the fused image are enhanced by constructing multiple sub-dictionaries, and finally the fused detail layer image is fused and reconstructed with the fused base layer image to obtain the fused image. The fused AD medical information features are preserved.

2. Materials and methods

In the process of image fusion, the most important thing is how to extract low-high-frequency coefficients and the fusion criteria of low-high-frequency coefficients. First of all, the DCT transform is used to decompose the MRI image in multi-scale; the DCT coefficients of each image block are normalized respectively, and its low-order DCT coefficients are calculated. The ratio of the energy of the higher-order DCT coefficient to that of the lower-order DCT coefficient is used as the focusing evaluation function. Then, the sub-images on each scale are convoluted sparsely encoded, and the sparse coefficients of different sub-images are obtained. The high-frequency sub-image coefficients are combined with the improved L₁ norm and the novel sum-modified SF (NMSF), and the low-frequency sub-images are fused with the improved L₁ norm and regional energy. Finally, the fused low-frequency sub-band and high-frequency sub-band are transformed by inverse DCT transform to get the final fused image. The principle of the image fusion algorithm based on DCT transform is shown in Figure 1:

FIGURE 1

Figure 1. Flowchart of image fusion algorithm for discrete cosine transform (DCT) transform.

2.1. DCT decomposition

2.1.1. Low-frequency component

The most important part of information for vision is concentrated in the low frequencies of the image. Low frequencies represent the slow variation between image pixels. It is the large flat area of the image that describes the main part of the image and is a comprehensive measure of the intensity of the whole image. In order to maintain the visibility of the image, the low-frequency part of the image is preserved, and changes in the low-frequency part may cause large changes in the image. The low-frequency coefficients of the fused image based on the DCT transform are averaged, assuming that there are p multi-exposure images, which can be defined as:

G (i, j) = \sum_{k = 1}^{p} w_{k} G_{k} (i, j) (1)

\sum_{k = 1}^{p} w_{k} = 1 (2)

w_{1} = w_{2} = \dots = w_{n} = \dots w_{p} = 1 / p (3)

where G_k(i,j) is the low-frequency coefficients extracted from the source image after DCT transformation; G(i,j) is the fused low-frequency coefficients; and w_k is the weighting factor.

2.1.2. High-frequency component

The high-frequency coefficients correspond to detailed information in the image, such as edges, and are extracted from the 8 × 8 chunked image after the DCT transform. The standard deviation of the high-frequency coefficients D(i,j) in the (2k + 1)×(2k + 1) neighborhood centered on pixel (i,j) is calculated separately.

C (i, j) = \sqrt{\frac{\sum_{m = i - k}^{i + k} \sum_{n = j - k}^{j + k} {[D (m, n) - \bar{M} (m, n)]}^{2}}{d - 1}} (4)

where, D is the number of pixel points in the region (2k + 1)×(2k + 1); D is the value of the high frequency coefficient corresponding to the (m,n) point; $\bar{M} (m, n)$ is the average value of pixels in the region, which can be defined as:

\bar{M} (i, j) = \frac{1}{d} \sum_{m = i - k}^{i + k} \sum_{n - j - k}^{j + k} D (m, n) (5)

The regional standard deviation of the high-frequency coefficients for each of the P multi-exposure images is [C1(i,j),C2(i,j),…,Cp(i,j)], then the weight coefficients corresponding of the extracted high-frequency coefficient is:

w_{k} (i, j) = \frac{C_{k} (i, j)}{\sum_{k = 1}^{p} C_{k} (i, j)} (6)

where, the weights of the P multi-exposure images are compared to them, the fused high-frequency coefficient D(i,j) is the high-frequency coefficient corresponding to the largest weighting factor.

w_{k} (i, j) = \max [w_{1} (i, j), w_{2} (i, j), \dots, w_{k} (i, j), \dots, w_{p} (i, j)] (7)

D (i, j) = D_{k} (i, j) (8)

2.2. Sparse representation

The medical image fusion method consists of the following four parts: (1) Multi-scale dictionary learning to train sub-dictionaries on different scales of sub-images as training images. (2) Convolutional sparse coding of the dictionaries at different scales to find their convolutional sparse coefficients. (3) Low-frequency sub-band coefficient fusion rules for low-frequency sub-images are fused according to the set fusion rules. (4) High-frequency sub-band coefficient fusion rules fuse high-frequency sub-images at different scales.

2.2.1. Multi-scale dictionary learning

The source images A and B are firstly decomposed by l-level DCT to obtain their decomposition coefficients ${H_{A}^{l, k}, L_{A}}$ and ${H_{B}^{l, k}, L_{B}}$ , respectively. Where, $H_{A}^{l, k}$ and $H_{B}^{l, k}$ denote the high-frequency sub-band coefficients of source images A and B at decomposition scale l and orientation k. L_A and L_B are the low-frequency sub-band coefficients of images A and B, respectively. The sub-band images of each scale are used as training images to train the corresponding convolutional sparse sub-dictionaries. The different convolutional sparse sub-dictionaries capture the features of the sub-images at different scales. Finally, the low-frequency and high-frequency sub-dictionaries are formed by combining the sub-dictionaries at different scales. The high-frequency first-scale images $H_{A}^{l, k}$ and $H_{B}^{l, k}$ of source images A and B are used as training images ${y_{m}}_{m = 1}^{M}$ , and the corresponding convolutional sparse dictionary learning models are built. The formula is as follows:

\underset{d, x}{\arg \min} \frac{1}{2} \sum_{m = 1}^{M} {‖ y_{m} - \sum_{k = 1}^{K} d_{k} \times x_{m, k} ‖}_{2}^{2} + λ \sum_{m = 1}^{M} \sum_{k = 1}^{K} {|| x_{m, k} ||}_{1} (9)

where, x_m,k is the sparse coefficient corresponding to the mth training image; d_k is the corresponding filter; * denotes the two-dimensional convolution operation; λ is the regularization parameter; and ||⋅||₁ denotes the l₁ parametric number, which represents the sum of the absolute values of the elements.

(1) Dictionary update phase. By keeping the sparse coefficients constant, each filter is optimally updated with the following equation:

\underset{d_{k}}{\arg \min} \frac{1}{2} \sum_{m = 1}^{M} {‖ y_{m} - \sum_{k = 1}^{K} d_{k} \times x_{m, k} ‖}_{2}^{2} (10)

To optimize the filter in the discrete Fourier domain, the filter d_k needs to be zero-filled to the same size as x_m,k. Taking into account the normalization of d_k with zero padding, the formula is as follows:

\underset{{d_{m}}, {g_{m}}}{\arg \min} \frac{1}{2} \sum_{k} {‖ \sum_{m} x_{m, k} \times d_{m} - s_{k} ‖}_{2}^{2} + \sum_{m} l_{C_{P N}} (g_{m}) (11)

The ADMM algorithm shows that C_PN = {x ∈ R^N:(I−PP^T)x = 0,||x||₂ = 1} represents the constraint range, the indicator function is defined as $l_{s} (X) = {\begin{matrix} 0 & if X \in S \\ \infty & if X \notin S \end{matrix}$ , and g_m is an auxiliary variable introduced to facilitate optimal derivation. The resulting updated convolution filter d_k can be obtained.

(2) Convolutional sparse coefficient update phase. Update the coefficients by keeping the filter unchanged:

\underset{x_{m, k}, z_{m, k}}{\arg \min} \frac{1}{2} \sum_{m = 1}^{M} {‖ y_{m} - \sum_{k = 1}^{K} d_{k} \times x_{m, k} ‖}_{2}^{2} + λ \sum_{m = 1}^{M} \sum_{k = 1}^{K} {|| z_{m, k} ||}_{1} (12)

where, z_m,k is the introduced auxiliary variable. We obtain the updated convolutional sparse coefficients by alternating iterative solutions.

In Figure 2, the cyclic execution of the dictionary and the convolutional sparse coefficients are updated to a predetermined maximum number of cycles or a set parameter threshold to stop. The convolutional sub-dictionary D₁ for the first scale of the high-frequency sub-image is output. The second and third scales of the high-frequency sub-images are dictionary learned separately to obtain the convolutional sub-dictionaries D₂, D₃. By combining each high-frequency sub-dictionary, and the high-frequency dictionary D_h = [D₁,D₂,D₃] is obtained. The low-frequency sub-images are subjected to dictionary learning, and the low-frequency dictionary D_l is obtained.

FIGURE 2

Figure 2. Flow chart of multi-scale dictionary learning.

2.2.2. Convolutional sparse coding

To better capture the detailed texture information of medical images and reduce the influence of artifacts, a high-frequency dictionary D_h and a low-frequency dictionary D_l are obtained by learning. Convolutional sparse coding is performed on the decomposition coefficients ${H_{A}^{l, k}, L_{A}}$ of the source image A. The TV regularization is then incorporated into the convolutional sparse coding model. The formula is as follows:

\underset{{x_{k}}}{\arg \min} \frac{1}{2} {‖ s - \sum_{k} d_{k} \times x_{k} ‖}_{2}^{2} + λ \sum_{k} {|| x_{k} ||}_{1} + λ_{1} T V (\sum_{k = 1}^{K} d_{k} {}^{*}x_{k}) (13)

where, TV(X) = ||g₀×x||₁ + ||g₀×x||₁, g₀ and g₁ are the filters used to calculate the gradients along the rows and columns of the image, respectively. The sparsity coefficients ${x_{m, L}^{A}, x_{m, l, k, H}^{A}}$ and ${x_{m, L}^{B}, x_{m, l, k, H}^{B}}$ of the coefficients in the sub-bands of the source images A and B, respectively, are obtained by optimal solution in the discrete Fourier domain. Where, m denotes the number of filters and convolutional sparse coefficient maps; L denotes the low frequency image; Hdenotes the high frequency image; and l and k denote the scale and orientation of the corresponding sub-bands, respectively.

2.2.3. Low-frequency coefficient fusion rules

After DCT decomposition, the energy information of the image is contained in the low-frequency sub-bands L_A and L_B of the source images A and B, which are displayed as basic information such as the contour and brightness of the image. The averaging strategy generally used for low-frequency coefficient fusion tends to lead to a reduction in the contrast of the image. In the case of fusion using the Max−L₁ rule with SR, the reduction of contrast can be effectively avoided. However, it can lead to the problem of spatial inconsistency of the image. At the same time, because the region energy can better reflect the energy and significant features of the image, and the convolution sparse coefficients of the L₁ parameter averaging strategy can effectively reduce the effect of misalignment. Therefore, a combination of region energy and averaged L₁ parameter is used to fuse the low-frequency sparse coefficients.

L_{F} = \sum_{m = 1}^{M} d_{m}^{l} \times x_{m, L}^{F} (14)

2.2.4. High-frequency coefficient fusion rules

The high frequency sub-bands $H_{A}^{l, k}$ and $H_{B}^{l, k}$ of the source images A and B contain a large amount of information such as texture details of the images. The convolutional SR of the fusion method has good performance in preserving detail information, and the improved spatial frequency and can well reflect the gradient changes of the image texture. Therefore, the improved spatial frequency combined with the average L₁ parameter strategy is used to fuse the high frequency sparse coefficients.

H_{F}^{l, k} = \sum_{m = 1}^{M} d_{m}^{h} \times x_{m, l, k, H}^{F} (15)

2.2.5. Multimodal medical image fusion

The problem of spatial inconsistency in multimodal images is caused when the L₁ parametric maximum fusion rule is used in traditional SR-based fusion methods. Therefore, we decompose the source image by performing DCT on it. Different sub-dictionaries are trained for features of different scales. A rule combining region energy and activity coefficients is used for fusion of the low frequency component coefficients, and a modified rule combining spatial frequency and activity coefficients is used for fusion of the high frequency component coefficients. The problems of reduced contrast, blurred details and inadequate information extraction are avoided. The specific steps are as follows:

(1) The DCT decomposition of source images A and B is performed to obtain the respective decomposition coefficients ${H_{A}^{l, k}, L_{A}}$ and ${H_{B}^{l, k}, L_{B}}$ .

(2) In the dictionary learning stage, the images at different scales corresponding to the multimodal source images are used as training sets, and the sub-dictionaries D₀, D₁, D₂, D₃ corresponding of each scale is derived. The low-frequency dictionary is D_l = D₀. The high-frequency dictionary is D_h = [D₁,D₂,D₃].

(3) Sparse coding stage. Convolutional sparse coding is performed on the sub-images of different orientations at each scale to obtain the corresponding convolutional sparse ${x_{m, L}^{A}, x_{m, l, k, H}^{A}}$ and ${x_{m, L}^{B}, x_{m, l, k, H}^{B}}$ .

(4) Low-frequency component fusion stage. The regional energies E_A and E_B of L_A and L_B, and the active level maps ${\bar{α}}_{A}$ and ${\bar{α}}_{B}$ are calculated. The convolution sparsity coefficients $x_{m, L}^{F}$ are obtained after fusion. Finally, the convolution sparse coefficients are reconstructed with the low-frequency dictionary convolution to obtain the low-frequency sub-band image L_F.

(5) High-frequency component fusion stage. The fused convolutional sparse coefficients C are obtained by calculating the improved spatial frequencies of $H_{A}^{l, k}$ and $H_{B}^{l, k}$ . Then the high-frequency sub-band images $H_{F}^{l, k}$ are obtained by convolutional fusion with the high-frequency dictionary D_h.

(6) Finally, the fused image F is obtained by performing inverse DCT on the fused sub-band image ${H_{F}^{{}^{'}, k}, L_{F}}$ .

3. Results and discussion

3.1. Data set and training parameter settings

(1) Experimental settings

All our experiments are conducted on a computer with Intel Core i7-10750H CPU 2.60 GHz, 16 GB RAM, NVIDIA GeForce GTX 3090 Ti. We train the convolutional sparse and low-rank dictionary with sliding step size set to 1, sliding window size set to 8×8, dictionary size set to 64×512, error set to ℰ = 0.03, and decomposition level set to 3.

(2) Data sets and comparison methods

To validate the performance of the proposed method. We selected 136 sets of aligned AD brain medical images (image size of 256×256 pixels) as the source images to be fused. All image slices were obtained from the Harvard Whole Brain Atlas database (Johnson and Becker, 2001), and the three AD medical image types included 42 sets of CT-MRI images; 42 sets of MRI-PET images; and 52 sets of MRI-SPECT images. Four contrast algorithms were adopted for comparison, including nonsubsampled contourlet (NSCT) (Li and Wang, 2011), NSST (Kong et al., 2014), guided filter ng fusion (GFF) (Li et al., 2013), and Laplacian redecomposition (ReLP) (Li et al., 2020).

(3) Objective evaluation metrics

We selected 10 metrics for objective index evaluation and analysis: mutual information (MI) (Xydeas and Petrovic, 2000), natural image quality evaluator (NIQE) (Mittal et al., 2012), average gradient (AG) (Du et al., 2017), edge intensity (EI) (Wang et al., 2012), tone-mapped image quality index (TMQI) (Yeganeh and Wang, 2012), spatial frequency (SF) (Eskicioglu and Fisher, 1995), SD (Liu et al., 2011), root mean square error (RMSE) (Zhang et al., 2018), gradient similarity mechanism (GSM) (Liu et al., 2011), and VIF (Sheikh and Bovik, 2006). SF is the spatial frequency, which is a measure of the richness of image detail and reflects the sharpness of image detail. A larger value means that the image detail is richer. SD is the standard deviation, which measures the contrast of the image; a larger value indicates a better contrast of the image. RMSE is the root mean square error, which measures the difference between the fused image and the source image; a smaller value indicates that the fused image information is closer to the source image. GSM measures the gradient similarity between images; a larger value indicates that the gradient information of the fused image is closer to the source image. NIQE index, the smaller the value, the smaller the distortion. VIF is an image information measure that quantifies the information present in the fused image; larger values indicate better fusion. NIQE measures the simple distance between the model statistic and the distorted image statistic. AG indicates the average gradient, which is used to extract the contrast and texture change features of the image. EI reflects the sharpness of the edges. TMQI index measures the significant features of brightness and contrast between the reference image and the fused image, and measures the structural fidelity of the fused image. Larger values of MI, SF, AG, EI, and TMQI indexes indicate better fusion.

3.2. CT-MRI fusion results comparison

In Figure 3, we used 42 sets of CT-MRI fused images and randomly selected seven fused images for comparison. From the figure, we can see that the images fused by NSCT and GFF algorithms are too dark. The images fused by NSST are not only darker but also distorted. The images fused by ReLP algorithm have better brightness but not enough texture details. Our fusion algorithm performs best in terms of brightness, detail texture and edge contour.

FIGURE 3

Figure 3. CT-MRI fusion images obtained by five fusion methods. (A) Computed tomography (CT) source image; (B) magnetic resonance imaging (MRI) source image; (C) nonsubsampled contourlet (NSCT) fusion result; (D) non-subsampled shear wave transform (NSST) fusion result; (E) guided filter ng fusion (GFF) fusion result; (F) Laplacian redecomposition (ReLP) fusion result; (G) fusion result of the proposed method.

The mean values of objective evaluation metrics for fusion results corresponding to different rules are given in Table 1, where bold indicates that the method ranks best in the metrics. NSCT is low in SD in terms of metrics, indicating insufficient image contrast. NSST is lowest in SF in terms of metrics, with poor image details. ReLP is less distorted with our method in terms of RMSE and GSM. Our proposed method performs better performance in terms of color retention, contrast, and detail retention, and achieves the optimum.

TABLE 1

Table 1. Average values of index evaluation of different fusion methods for CT-MRI.

3.3. MRI-PET fusion results comparison

Figures 4, 5 use 42 sets of MRI-PET images from MRI-PET datasets Paras1 and Paras2, respectively. We randomly selected eight fused images. From Figure 4, it can be seen that NSCT and GFF show severe distortion, and the fused images of NSST and ReLP algorithms are too dark and have loss of detail information. In Figure 5, NSCT and NSST have dark luminance and GFF has distortion, while ReLP and our fusion algorithm have better visual effect.

FIGURE 4

Figure 4. MRI-PET fusion images obtained by five methods in Paras1. (A) Magnetic resonance imaging (MRI) source image; (B) positron emission tomography (PET) source image; (C) nonsubsampled contourlet (NSCT) fusion result; (D) non-subsampled shear wave transform (NSST) fusion result; (E) guided filter ng fusion (GFF) fusion result; (F) Laplacian redecomposition (ReLP) fusion result; (G) fusion result of the proposed method.

FIGURE 5

Figure 5. MRI-PET fusion images of five methods in Paras2. (A) Positron emission tomography (PET) source image; (B) magnetic resonance imaging (MRI) source image; (C) nonsubsampled contourlet (NSCT) fusion result; (D) non-subsampled shear wave transform (NSST) fusion result; (E) guided filter ng fusion (GFF) fusion result; (F) Laplacian redecomposition (ReLP) fusion result; (G) is the fusion result of the proposed method.

In Table 2, by comparing 42 sets of fused images on the MRI-PET dataset, our proposed algorithm has the best mean value in objective evaluation metrics. Higher contrast, sharper edges and finer details were obtained. The subjective results of the fused images of the two algorithms, NSCT and GFF, were not satisfactory. NSCT and GFF had more color distortion. NSST showed abnormal brightness. ReLP performed better and was close to our average value. So far, it is easy to see that the multi-objective evaluation index of the integrated information is consistent with the conclusions of the subjective analysis. Our proposed algorithm significantly outperforms the average of all algorithms. In summary, we have a more comprehensive advantage over existing algorithms in the evaluation of objective metrics.

TABLE 2

Table 2. Average values of index evaluation of different fusion methods for MRI-PET.

3.4. MRI-SPECT fusion results comparison

Figures 6, 7 we used MRI-SPECT datasets Paras1 and Paras2, respectively 52 sets of AD MRI-PET image fusion images for comparison. It can be seen from the figures that NSCT shows severe distortion, NSST fused images are too dark, GFF shows brightness anomalies, and ReLP does not perform well in terms of detail texture. Our fusion algorithm performs best in brightness, detail texture and edge contour.

FIGURE 6

Figure 6. MRI-SPECT fusion images of the five methods in Paras1. (A) Single photon emission computed tomography (SPECT) source image; (B) magnetic resonance imaging (MRI) source image; (C) nonsubsampled contourlet (NSCT) fusion result; (D) non-subsampled shear wave transform (NSST) fusion result; (E) guided filter ng fusion (GFF) fusion result; (F) Laplacian redecomposition (ReLP) fusion result; (G) fusion result of the proposed method.

FIGURE 7

Figure 7. MRI-SPECT fusion images of the five methods in Paras2. (A) Single photon emission computed tomography (SPECT) source image; (B) magnetic resonance imaging (MRI) source image; (C) nonsubsampled contourlet (NSCT) fusion result; (D) non-subsampled shear wave transform (NSST) fusion result; (E) guided filter ng fusion (GFF) fusion result; (F) Laplacian redecomposition (ReLP) fusion result; (G) fusion result of the proposed method.

In Table 3, we use 52 sets of fused images on the MRI-SPECT dataset for comparison, and our proposed DCT multiscale decomposition obtains sharper edges and finer details. The improved NMSF fusion rule obtains better brightness and contrast. The superiority of our method over other algorithms is demonstrated.

TABLE 3

Table 3. Average values of index evaluation of different fusion methods for MRI-SPECT.

To compare the advantages of the proposed methods more comprehensively, we calculate the running times of the comparison methods on the same pair of images of 256×256 size. Figure 8 shows the line graphs of the average running times of our method and the four comparison methods. The ReLP method has the longest running time and GFF has the shortest running time. From the line graph, it can be seen that our fusion method has the second best running speed than most of the other algorithms. However, medical image fusion is used to assist in diagnosis and treatment, and the effectiveness of the proposed method is demonstrated from objective and subjective evaluations. Therefore, the proposed method guarantees the quality of fusion results within an acceptable time consumption.

FIGURE 8

Figure 8. Comparison of running time.

3.5. Comparison of subjective evaluation

Computed tomography (CT) or MRI unimodal imaging can no longer meet the demand for precise diagnosis in neurosurgery. A multimodality imaging technique that can clearly, visually, and holistically show AD brain atrophy and its association with surrounding cerebral vessels, nerves, and brain tissues can only accommodate the development of neurosurgical precision surgery. Quality assessment of multimodal fusion requires additional medical expertise. Therefore, we invited six chief neurosurgeons with more than 5 years of experience, and we randomly selected a test sample of 10 groups, each group including five fusion images. The subjective evaluation criteria were double stimulus continuous quality scale (DSCQS) including contrast, detail and invariance and acceptability scores of [1 (worst) to 5 (best)]. 1 indicates a non-diagnostic image and 5 indicates a good quality diagnostic image. Pathological invariance was scored as 0 (change) or 1 (no change). Table 4 shows the ratings of six clinicians, and the optimal values are shown in bold.

TABLE 4

Table 4. Subjective quality evaluation of different fusion algorithms.

In Table 4, the subjective physician evaluations of CT-MRI and MRI-SPECT fusion are presented. The NSCT and GFF contrast and brightness were insufficient and therefore rated low. The GFF showed the worst distortion acceptability evaluation. The ReLP was very close to our evaluation among the four evaluation metrics. Our algorithm has the best performance in edge detail, luminance, contrast and spatial coherence, and received the best physician evaluation.

4. Conclusion

Multimodal neuroimaging data have high dimensionality and complexity, and seeking efficient methods to extract valuable features in complex datasets is the focus of current research. To address the shortcomings of AD multimodal fusion images such as contrast reduction, detail blurring and color distortion, we propose a multimodal fusion algorithm for Alzheimer’s disease based on DCT convolutional SR. The DCT multi-scale decomposition of the source medical image is performed to obtain the basic layer, local average energy layer and texture layer of the input image, and then the sub-images of different scales are used as training images respectively. The sub-dictionaries at different scales are optimally solved using the ADMM algorithm, and then convolutional sparse coding is performed, and the inverse DCT transform of the subimage coefficients is performed using a combination of improved L₁ parameters and improved NMSF rules to obtain the multimodal fusion images. We experimentally demonstrate that the algorithm has sharper edge details, better color and spatial consistency than other algorithms by fusing medical images in three modalities, CT-MRI, MRI-PET, and MRI-SPECT. This proves that our algorithm outperforms existing state-of-the-art algorithms. In the future, we will use deep learning models for medical image multimodality classification and prediction, and apply them to early clinical diagnosis of AD.

Data availability statement

The original contributions presented in this study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

Ethics statement

Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author contributions

GZ and XN: investigation, methodology, software, validation, visualization, and writing—original draft. BL, JL, and HY: investigation, methodology, software, and supervision. WS and SH: conceptualization, data curation, formal analysis, funding acquisition, methodology, project administration, resources, supervision, validation, and writing—review and editing.

Funding

This work was supported by the Doctoral Innovative Talents Project of Chongqing University of Posts and Telecommunications (BYJS202107 and BYJS202112), the open project program of Chongqing Key Laboratory of Photo-Electric Functional Materials [K(2022)215], Postdoctoral Science Foundation of Chongqing (cstc2021jcyj-bsh0218), Special financial aid to post-doctor research fellow of Chongqing (2011010006445227), the National Natural Science Foundation of China (U21A20447 and 61971079), the Basic Research and Frontier Exploration Project of Chongqing (cstc2019jcyjmsxmX0666), Chongqing technological innovation and application development project (cstc2021jscx-gksbx0051), the Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-k202000604), the Innovative Group Project of the National Natural Science Foundation of Chongqing (cstc2020jcyj-cxttX0002), and the Regional Creative Cooperation Program of Sichuan (2020YFQ0025).

Acknowledgments

We thank the School of Optoelectronic Engineering of Chongqing University of Posts and Telecommunications for their assistance in the research.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Chetelat, G., Desgranges, B., De La Sayette, V., Viader, F., Eustache, F., and Baron, J. C. (2003). Mild cognitive impairment: Can FDG-PET predict who is to rapidly convert to Alzheimer’s disease? Neurology 60, 1374–1377. doi: 10.1212/01.wnl.0000055847.17752.e6

A multimodal fusion method for Alzheimer’s disease based on DCT convolutional sparse representation

1. Introduction

2. Materials and methods

2.1. DCT decomposition

2.1.1. Low-frequency component

2.1.2. High-frequency component

2.2. Sparse representation

2.2.1. Multi-scale dictionary learning

2.2.2. Convolutional sparse coding

2.2.3. Low-frequency coefficient fusion rules

2.2.4. High-frequency coefficient fusion rules

2.2.5. Multimodal medical image fusion

3. Results and discussion

3.1. Data set and training parameter settings

3.2. CT-MRI fusion results comparison

3.3. MRI-PET fusion results comparison

3.4. MRI-SPECT fusion results comparison

3.5. Comparison of subjective evaluation

4. Conclusion

Data availability statement

Ethics statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Publisher’s note

References

95% of researchers rate our articles as excellent or good