Identification of Alzheimer's EEG With a WVG Network-Based Fuzzy Learning Approach

Yu, Haitao; Zhu, Lin; Cai, Lihui; Wang, Jiang; Liu, Jing; Wang, Ruofan; Zhang, Zhiyong

doi:10.3389/fnins.2020.00641

ORIGINAL RESEARCH article

Front. Neurosci., 21 July 2020

Sec. Brain Imaging Methods

Volume 14 - 2020 | https://doi.org/10.3389/fnins.2020.00641

This article is part of the Research TopicCombined EEG in Research and Diagnostics: novel perspectives and improvementsView all 12 articles

Identification of Alzheimer's EEG With a WVG Network-Based Fuzzy Learning Approach

Haitao Yu¹^*

Lin Zhu¹

Lihui Cai¹

Jiang Wang¹

Jing Liu²^*

Ruofan Wang³

Zhiyong Zhang⁴

¹School of Electrical and Information Engineering, Tianjin University, Tianjin, China
²Department of Neurology, Tangshan Gongren Hospital, Tangshan, China
³School of Information Technology Engineering, Tianjin University of Technology and Education, Tianjin, China
⁴Department of Pathology, Tangshan Gongren Hospital, Tangshan, China

A novel analytical framework combined fuzzy learning and complex network approaches is proposed for the identification of Alzheimer's disease (AD) with multichannel scalp-recorded electroencephalograph (EEG) signals. Weighted visibility graph (WVG) algorithm is first applied to transform each channel EEG into network and its topological parameters were further extracted. Statistical analysis indicates that AD and normal subjects show significant difference in the structure of WVG network and thus can be used to identify Alzheimer's disease. Taking network parameters as input features, a Takagi-Sugeno-Kang (TSK) fuzzy model is established to identify AD's EEG signal. Three feature sets—single parameter from multi-networks, multi-parameters from single network, and multi-parameters from multi-networks—are considered as input vectors. The number and order of input features in each set is optimized with various feature selection methods. Classification results demonstrate the ability of network-based TSK fuzzy classifiers and the feasibility of three input feature sets. The highest accuracy that can be achieved is 95.28% for single parameter from four networks, 93.41% for three parameters from single network. In particular, multi-parameters from the multi-networks set obtained the best result. The highest accuracy, 97.12%, is achieved with five features selected from four networks. The combination of network and fuzzy learning can highly improve the efficiency of AD's EEG identification.

Introduction

Currently, Alzheimer's disease (AD) is becoming a common and serious disease due to organic neurodegenerative and progressive lesions in the brain. The patients always show some typical clinical presentations, particularly in the aspect of cognitive dysfunction such as deficient episodic memory and disabled remembering (Smailovic et al., 2018). The clinical diagnosis of AD currently adopts scale assessment, such as Mini-mental State Examination (MMSE), Montreal Cognitive Assessment (MoCA), activities of daily living (ADL), and physiological detection of cerebrospinal fluid. Patients with severe AD can be observed to have changes in brain structure, such as encephalatrophy, through brain functional imaging. Yang et al. applied magnetic resonance imaging (MRI) to detect the cerebral changes of blood flow and oxygenation in AD and mild cognitive impairment (MCI) subjects, and showed its powerful ability to distinguish from normal controls (Yang et al., 2010). Hiroshi's study has demonstrated progression of atrophy mapping upstream to Braak's stages of neurofibrillary tangle deposition in AD. The main cause of organic brain lesions in AD is considered to be the loss of neurons and synapses (Brenner et al., 1988). It has been suggested that the loss of both synapses and neural pathways leads to a decrease in brain functional connectivity and influences electrical signals of the brain, so it is feasible to diagnose neurotic disease by electroencephalogram (EEG). EEG, which can measure the brain's voltage fluctuations with high temporal resolution, contains plenty of physiological information, and there is growing evidence that EEG may contribute to early recognition of AD patients.

The conventional EEG visual inspection is one of methods widely used in neurological assessment. Numerous previous studies have reported the disappearance of alpha EEG activities, particularly in posterior brain regions, through unaided viewing (Matsuda, 2013; Wang et al., 2015; Horvath et al., 2018). It has also been reported that visual EEG scores of ADs show a strong correlation with dementia severity (Kowalski et al., 2001). In the study of de Waal et al. (de Waal et al., 2011), AD patients with early onset are more likely to show severe diffuse slowing characteristic than those with later onset, which is consistent with the clinical manifestations of AD. In addition, studies have quantified the complexity of electrophysiological activities and reported declined complexity of EEG in AD patients (Cao et al., 2016). The change on the AD brain is also reflected in the perturbations of EEG synchronization. As EEG signals are irregular and non-stationary complex signals, traditional visual inspection is not sufficient for AD EEG identification (Buzsaki and Draguhn, 2004; de Waal et al., 2011; Cao et al., 2016). To address this issue, complex network theory is introduced into AD diagnosis, which aims to describe human brain from a global perspective (Palop et al., 2006; Nimmrich et al., 2015; Cao et al., 2016; Gao et al., 2019).

Over the past few years, more and more researchers have begun to adopt the attractive idea of using complex network methods to characterize the dynamic features of complex systems (Zou et al., 2019). This novel approach is the thorough combination of two frontier research fields, analysis methods of non-linear time series (Hively et al., 2000; Costa et al., 2002, 2005) and complex networks theory (Brown et al., 2004; Boccaletti et al., 2006). Zhang et al. have constructed complex networks with strength of temporal correlation between time series and reported that the behavior information (chaotic or fractal) of time series directly correlate with the topological structures (Zhang and Small, 2006). As an effective tool to get insight into the brain function, the brain network analysis has been widely applied in AD research. The healthy brain was found to work with network properties such as small-worldness, hubness, and rich-clubs, while the AD brain operated with less optimal network topologies (Meunier et al., 2010; Blinowska and Kaminski, 2013; Martijn and van den Heuvel, 2013; Wang et al., 2014, 2016; Deng et al., 2015). Loss of small-world features (toward random network topology) can be observed in functional network constructed from EEG and functional magnetic resonance imaging data (Stam et al., 2007; He and Evans, 2010; Tahaei et al., 2012; Reid and Evans, 2013). Numerous EEG studies have consistently demonstrated decreased functional connections in the higher frequency bands of AD patients compared to controls (Tijms et al., 2013; van Straaten et al., 2014).

Compared to other approaches of constructing complex networks through time sequence, visibility graph (VG) algorithms can better integrate the basic features of time series. Lacasa et al. and Liu et al. converted time series into graphs and extracted the topological features using graph theory methods (Lacasa et al., 2008; Liu et al., 2010). They pointed out that the irregularity of time sequence can be characterized by the network topology. For instance, the periodic sequence can be transformed into regular lattice, while the chaotic series corresponds to random graphs. Subsequent researches began to introduce VG method into the EEG study of neurological diseases, and found features extracted from VG networks can be effectively used as mathematical markers in neurodegenerative diagnosis. VG algorithm was first applied in related research in AD by Ahmadlou et al. They reported that complexity of EEGs computed by VGs can be used in the distinguishing between AD and control EEGs (Ahmadlou et al., 2010).

The VG can only express the existence of edges between different time nodes, but not the strength of the edges. Therefore, Supriya et al. have proposed to combine the weighted edge with the horizontal visibility graph, which are not applicable to all complex network graphs (Supriya et al., 2016). Addressing the limitations of above approaches, Zhu et al. have improved the weighted visibility graph (WVG) algorithm by specifying radian function as the criterion for calculating edge weights in all kinds of complex network, and obtained promising results in the detection of epilepsy (Zhu et al., 2014). Also, studies have shown that the visualization method can preserve the characteristics like reduction of complexity (Polikar et al., 2007; Czigler et al., 2008) and slowing of rhythm (Dauwels et al., 2011; Cao et al., 2015; McBride et al., 2015) in patients with AD. WVG networks retain more structural information of the time series, which is more conducive for AD identification, compared to connectivity networks. Therefore, we apply the WVG method on the feature extraction of Alzheimer's disease. A variety of different parameters are extracted from the visibility graph, and used to further investigate which parameter can be used for diagnosing AD.

After quantitative analysis of complex WVG networks, the valuable information about the time series has been extracted. The machine learning generally approaches the extracted features for training the model and then applies them in signal detection. Traditional machine learning methods, including decision tree, random forests, k-nearest neighbor (KNN), Naive Bayes (NB), logistic regression, and so on (Siegelmann and Holzman, 2010; Hramov et al., 2019), have been widely used in the detection of neurological diseases. However, for systems with highly non-linear characteristics, models that built based on these methods do not characterize real models and be utilized in classification well. With this consideration, a rule-based fuzzy model is proposed and has been widely used in many fields like computer vision, natural language processing, and enhanced learning, achieving remarkable results (Gu et al., 2017). Takagi-Sugeno-Kang (TSK) method is proposed to build a model established by using fuzzy mathematics language to describe some characteristics and internal relations of fuzzy phenomena. Compared with traditional classifiers that lack transparency, TSK can be used in multiple features classification and shows a superior model interpretability, which is defined as the ability to better understand the decision strategies of response functions in a human-interpretable manner in order to interpret internal relationships (Deng et al., 2018). In current applications of machine learning, such interpretability has received wide attention and is considered to be crucial.

In this paper, multiple networks are constructed based on multi-channel EEG, with each EEG channel able to be transformed into one-layer network. Then a number of different network features are extracted from them, which is too much for input feature vectors. In order to explore this problem, some feature selection approaches will be utilized to choose features, and the influence of different screening methods on the final classification results will also be tried. The parameters will be divided into three groups—single parameter from multi-network, multi-parameter from single network, and multi-parameter from multi-network—to observe the difference between the classification results of fuzzy models trained with different types of features. The structure of rest paper is as follows: section Methods and materials is devoted to describing the experimental design, including data collection and subject condition. Meanwhile, the principle of mathematical graph methods and Takagi-Sugeno-Kang (TSK) model adopted in paper are also explained in this part. In section Experimental Results, we performed a statistical analysis of the results and implemented AD recognition based on the proposed framework. Section Conclusion and Discussion includes a discussion of the application and advantages of the proposed model, as well as future work.

Materials and Methods

Subjects and EEG Recordings

EEG recordings are collected from AD subjects and control subjects, respectively. The AD group included 30 confirmed AD patients who are diagnosed with mini-mental state examination (MMSE) scores are between 12 and 15. The diagnosis results meet the National Institute on Aging-Alzheimer's Association criteria. All of them have not used antipsychotic drugs, antidepressants, dopamine blockers, or excessive amounts of alcohol, and don't have other neurological or psychiatric disorders or any other serious illness. The AD group includes 18 females and 12 males, whose ages range from 74 to 78. The control group consisted of 30 healthy subjects of matched ages, ranging from 70 to 76 years old, and includes 10 females and 20 males. The MMSE scores of them are between 28 and 30. In order to avoid the impact on EEG activity, all subjects will be prohibited from using neuroactive drugs before the experiment. The data adopted in this paper is from our previous study (Wang et al., 2016), which is approved by the Ethics Committee of Tangshan Gongren Hospital and was conducted in accordance with the Declaration of Helsinki. In addition, all the subjects in this experiment gave informed consent.

A 16-channel EEG monitoring system (Solar2000B) is adopted. The EEG channels have 10 MΩ input impedance with bandwidth as 0.08–300 Hz. In order to obtain low-frequency signals that meet the analysis requirements, the low-pass filtering range is set to 0.08–50 Hz. Studies have demonstrated that the EEG amplitude across different bands tends to stabilize when the scalp-electrode impedance is <10 kΩ, so electrode impedance in our experiments is set to 3 kΩ. The international 10–20 system, which consists of 16 electrodes, is adopted as electrode distribution in the scalp (surface) EEG recordings, and the linked earlobe A1 and A2 are used as reference. EEGs are recorded by Symtop amplifier (model: UEA–B; frequency: 1,024 Hz; electrode impedance: 3 k).

During the experiment, the subjects stayed in a semi-dark quiet room and were told to keep awake with eyes closed. The EEG recording process was kept to at least 30 min for each subject. In order to eliminate the impact of nervousness, anxiety, and head movement, a 10-min EEG is selected from each recorded EEG epoch. Sharp transient artifacts caused by eye movement and muscle artifacts, as well as segments with voltage exceeding 150uV, are also removed. Next, fifteen epochs without artifacts with an 8-s long duration for each (15 ^* 8 s = 120 s) were chosen for each subject's EEG, which are suitable for weighted visibility graph construction.

WVG Methods

The EEG signal is the electrical signal of the brain neurons measured on the surface of the cerebral cortex or scalp. It has obvious non-stationary, non-linear, and dynamic characteristics. The VG method provides a way to research the underlying dynamics of EEG data (Lacasa et al., 2008; Deng et al., 2018). Since the VG can inherit the dynamic nature of creating time series data, this technique has the characteristics of describing time series from the perspective of graph theory. The VG algorithm was originally applied in the field of robot motion planning, architectural design, and topographic descriptions of geographical space (Lozano-Pérez and Wesley, 1979; Turner et al., 2001; Lacasa et al., 2009; Jiang et al., 2017; Zou et al., 2019). This algorithm combined the mutual visible relationship of the point and obstacles in the two-dimensional landscape with the computational geometry framework. The literature study reveals that WVG can also be used in EEG data analysis to convert non-stationary, one-dimensional time series into two-dimensional viewable views for analysis. Different channels of EEG signals can reflect the electrophysiological information from different regions of the brain, so each single channel can obtain single complex network and multi-layer networks can be obtained through multi-channel EEG. The schematic diagram of constructing brain network by WVG method is shown in Figure 1.

FIGURE 1

Figure 1. The framework of our method for classifying the AD patients in EEG signals. First multichannel EEG signals of two types of subjects are acquired and a preliminary analysis was performed. Second, we construct the WVG network based on each EEG channel. Third, the features are extracted and further ranked based on feature select method. Finally, we combine the network theory with a fuzzy rule-based system to identify AD pattern with the selected network topological properties.

In the construction of a WVG from a univariate EEG data ${x_{i}}_{i = 1}^{N}$ with x_i = x(t_i), individual observations are considered as vertices. Thus, the weighted adjacency matrix W can be obtained with size of N × N. Nodes of WVG network are defined by time points {t_i}, i = 1, 2, ......N and each edge in this network is defined by the connection between two time points (Zou et al., 2019). Two nodes are defined as connected if the criterion

\begin{array}{l} \frac{x (t_{i}) - x (t_{k})}{t_{k} - t_{i}} > \frac{x (t_{i}) - x (t_{j})}{t_{k} - t_{i}} & (1) \end{array}

is fulfilled for all time points t_k with t_i < t_k < t_j. Then the absolute value of edge weight between two nodes are determined as

\begin{array}{l} w_{i, j} = arctan \frac{x (t_{i}) - x (t_{j})}{t_{i} - t_{j}}, i < j & (2) \end{array}

Feature Extraction and Select

The topology of the network is quantified based on the multiple complex networks obtained with WVG method. In order to statistically analyze the characteristics of AD networks and control networks, we calculate the clustering coefficient, average weighted degree, graph index complexity, network entropy, degree distribution index, modularity, local efficiency, and average path length as eight different topological characteristics.

Clustering Coefficient

The clustering coefficient is a measure to quantify how tightly connected the neighbor is around a node (Rubinov and Sporns, 2010). For a network G with N nodes, the connectivity between nodes i and j is a_i,j (a_i,j = 1 if the connection exists or a_i,j = 0 if not), the weight of connection are w_i,j (w_i,j ∈ [0, 1]). For a weighted network, the local clustering coefficient of node i is defined as:

\begin{array}{l} C (i) = \frac{1}{s_{i} (K_{i} - 1)} \sum_{j, h \in G} \frac{(w_{i, j} + w_{i, h})}{2} a_{i, j} a_{i, h} a_{j, h} & (3) \end{array}

where s_i, the strength of the node i, is defined as:

\begin{array}{l} s_{i} = \sum_{j \in G_{i}} w_{i, j} & (4) \end{array}

And G_i represents the nodes set of node i neighborhoods. Further define the clustering coefficient of the whole network as:

\begin{array}{l} C = \frac{1}{N} \sum_{i \in G_{i}} C (i) & (5) \end{array}

Average Weighted Degree

Average Weighted Degree is an important parameter for distinguishing networks with different topologies. The average weighted degree of the network can be obtained through averaging weights of the incident links on all the nodes in the network (Supriya et al., 2016):

\begin{array}{l} w d = \frac{1}{N} \sum_{i \in G_{i}} s_{i} & (6) \end{array}

where s_i is described above in function (4).

Graph Index Complexity

Kim et al. have introduced graph index complexity as a new feature into the diagnosis of patients with AD by quantifying the complexity of the image graph (Kim and Wilhelm, 2008; Wang et al., 2016). With the largest eigenvalue of the adjacency matrix of a graph with n nodes presented as λ_max (Blinowska and Kaminski, 2013). The graph index complexity is defined as follows:

\begin{array}{l} c_{λ_{max}} = 4 c (1 - c) & (7) \end{array}

where

\begin{array}{l} c = \frac{λ_{max} - 2 cos (π / (n + 1))}{n - 1 - 2 cos (π / (n + 1))} & (8) \end{array}

Degree Distribution Index

The degree distribution P_deg(k) is often used to classify complex networks, which can be formed by counting how many nodes have each degree. In this paper, a probability distribution object is obtained by fitting the Poisson distribution to the degree distribution vector. The degree distribution P_deg(k) is defined as

\begin{array}{l} P_{deg} (k) = \frac{λ^{k}}{k!} e^{- λ} & (9) \end{array}

The degree distribution index is defined as the λ values of the fitting distribution (Stephen and Toubia, 2009).

Network Entropy

The network entropy can be computed straightforwardly based on the degree distribution as

\begin{array}{l} S = - \sum_{k} P_{deg} (k) log P_{deg} (k) & (10) \end{array}

Modularity

Modularity is a quality feature that can measure the quality of the clusters (communities), which are obtained by dividing the network partition (Supriya et al., 2016). The modularity Q of this weighted network is defined as:

\begin{array}{l} Q = \frac{1}{2 m} \sum_{i, j} (a_{i, j} - \frac{k_{i} k_{j}}{2 m}) δ (C_{i}, C_{i}) & (11) \end{array}

where $m = \frac{1}{2} \sum_{i, j \in G} w_{i, j}$ is the sum weights of all links in the network, $k_{i} = \sum_{j \in G} w_{i, j}$ is the sum weight of the links attached to node i, C_i represents the community which vertex i is assigned to, the function δ(C_i, C_j) is 1 if nodes i and j belong to the same community and 0 otherwise. In this paper, we used the Louvain method (Blondel et al., 2008) to distribute nodes into different communities. This method is divided into two steps. In the first step, each node is added into the neighbor communities to determine the one which can maximize the modularity gain ΔQ. In second step, a new network is reconstructed whose node is defined as the small community found in the first step, and whose weights of new links are given by the sum weight of the links between nodes in the corresponding two old communities. Those two steps will be repeated iteratively until the maximum of modularity is accomplished and there is no more movement of nodes. The modularity gain ΔQ is defined as (Zhaohong et al., 2013):

\begin{array}{l} Δ Q = [\frac{Σ_{in} + k_{i, in}}{2 m} - {(\frac{Σ_{tot} + k_{i}}{2 m})}^{2}] \\ - [\frac{Σ_{in}}{2 m} - {(\frac{Σ_{tot}}{2 m})}^{2} - {(\frac{k_{i}}{2 m})}^{2}] & (12) \end{array}

where Σ_in represents the sum of all the links weights inside community C, Σ_tot is the sum of the weights of the links attached to nodes in C, k_i is the sum of the weights of the links attached to node i, k_i,in is the sum of the weights of the links from i to nodes in C, and m is the sum weights of all links in the network.

Local Efficiency

Local efficiency, as a node-specific measure, is defined to measure the density of the subnetwork composed of the neighborhood of the node i. Local efficiency of ith node is given as

\begin{array}{l} E_{l o c} (i) = \frac{1}{N_{G_{i}} (N_{G_{i}} - 1)} \sum_{i, j \in G, i \neq j} l_{i, j} & (13) \end{array}

Where l_i,j is the shortest distance between i and j, and N_{G_i} is the number of the neighborhood of node i. Local network efficiency is the average of the local efficiency of all nodes

\begin{array}{l} E_{l o c} = \frac{1}{N} \sum_{i} E_{l o c} (i) & (14) \end{array}

Average Path Length

Average path length is a vital index to measure information transmission ability of networks. It can be used to evaluate the connectivity of the global functional network, including local and remote connection. The average path L is defined as:

\begin{array}{l} L = \frac{1}{N (N - 1)} \sum_{i, j, i \neq j} l_{i, j} & (15) \end{array}

TSK Fuzzy Model

Given an original input dataset X = {x₁, x₂, …, x_n} ∈ R^d and the corresponding class label Y = {y₁, y₂, ..., y_n} (y_i,j = 1 when the ith sample belongs to jth class; otherwise, y_i,j = 0), the kth fuzzy inference rules are often defined as

\begin{array}{l} R^{k} : IF x_{1} is A_{1}^{k} \land x_{2} is A_{2}^{k} \land \dots \land x_{d} is A_{d}^{k}, THEN \\ f_{k} (x) = β_{0}^{k} + β_{1}^{k} x_{1} + . . . + β_{d}^{k} x_{d}, k = 1, . . ., K \end{array}

Where $x = {[x_{1}, x_{2}, . . ., x_{d}]}^{T}$ is input vector of each rule, K is the number of fuzzy rules, $A_{i}^{k}$ are Gaussian antecedent fuzzy sets subscribed by the input variable x_i of Rule k, ∧ is a fuzzy conjunction operator, f_k(x) is a linear function about the inputs, and $β_{i}^{k}$ are linear parameters.

With each rule is premised on the sample vector x, the output of a TSK fuzzy system is expressed as

\begin{array}{l} ỹ = \sum_{k = 1}^{K} \frac{μ_{k} (x) f_{k} (x)}{\sum_{k^{'} = 1}^{K} μ_{k^{'}} (x)} = \sum_{k = 1}^{K} {\tilde{μ}}_{k} (x) f_{k} (x) & (16) \end{array}

where

\begin{array}{l} μ_{k} (x) = \underset{i = 1}{\prod^{d}} μ_{A_{i}^{k}} (x_{i}) & (17) \end{array}

is the fuzzy membership function and

\begin{array}{l} {\tilde{μ}}_{k} (x) = \frac{μ_{k} (x)}{\sum_{k^{'} = 1}^{K} μ_{k^{'}} (x)} & (18) \end{array}

is the normalized fuzzy membership function of the antecedent parameters of the kth fuzzy rule. While $μ_{A_{i}^{k}} (x_{i})$ is Gaussian membership function for fuzzy set $A_{i}^{k}$ that can be expressed as

\begin{array}{l} μ_{A_{i}^{k}} (x_{i}) = exp (- \frac{{(x_{i} - c_{i}^{k})}^{2}}{δ_{i}^{k}}) & (19) \end{array}

where $c_{i}^{k}$ is kth cluster center parameters, which can be calculated with the classical fuzzy c-means (FCM) clustering algorithm (Bezdek et al., 1984):

\begin{array}{l} c_{i}^{k} = \frac{\sum_{j = 1}^{N} u_{j k} x_{j i}}{\sum_{j = 1}^{N} u_{j k}} & (20) \end{array}

and the width parameter $δ_{i}^{k}$ can be estimated by (Zhaohong et al., 2013):

\begin{array}{l} δ_{i}^{k} = \frac{h \cdot \sum_{j = 1}^{N} u_{j k} {(x_{j i} - c_{i}^{k})}^{2}}{\sum_{j = 1}^{N} u_{j k}} & (21) \end{array}

where the element u_jk ∈ [0, 1] denotes the fuzzy membership of nth input sample x_n to the kth cluster (k = 1, 2, ..., K), h is a constant called the scale parameter.

For an input sample x_n, let

\begin{array}{l} x_{n, e} = {(1, {x_{n}}^{T})}^{T} & (22) \end{array}

\begin{array}{l} {\tilde{x}}_{n}^{k} = {\tilde{μ}}^{k} (x_{n}) x_{e} & (23) \end{array}

\begin{array}{l} ρ (x_{n}) = {({({\tilde{x}}_{n}^{1})}^{T}, {({\tilde{x}}_{n}^{2})}^{T}, . . ., {({\tilde{x}}_{n}^{K})}^{T})}^{T} \in R^{K (d + 1)} & (24) \end{array}

\begin{array}{l} β^{k} = {(β_{0}^{k}, β_{1}^{k}, . . ., β_{d}^{k})}^{T} & (25) \end{array}

\begin{array}{l} β_{g} = {({(β^{1})}^{T}, {(β^{2})}^{T}, . . ., {(β^{K})}^{T})}^{T} & (26) \end{array}

then the output value ỹ_n of a TSK fuzzy classifier for sample x_n can be expressed as

\begin{array}{l} ỹ_{n} = {β_{g}}^{T} ρ (x_{n}) & (27) \end{array}

Learning Algorithm

Given a training dataset $D_{S} = {x_{i}, y_{i} | x_{i} \in R^{d}, y_{i} \in R^{C}, i = 1, . . ., N_{S}}$ , where C is the number of classes, the consequent parameter β_g can be learned by using generalized hidden-mapping ridge regression (GHRR) (Deng et al., 2014; Tian et al., 2019). The objective function is:

\begin{array}{l} min_{β_{g}} J (β_{g}) = \frac{1}{2} \sum_{j = 1}^{C} {\sum_{i = 1}^{N_{S}} ‖ {β_{g, j}}^{T} x_{g, i} - y_{i, j} ‖}^{2} + \frac{λ}{2} \sum_{j = 1}^{C} {β_{g, j}}^{T} β_{g, j} & (28) \end{array}

where is the consequent parameter vector of the jth class is represented by β_g,j, λ is a regularization parameter controls the complexity of the classifier, and the tolerance of error λ can be set manually or determined by cross-validation. The optimal consequent parameters, β_g,j, can be computed by setting the derivatives of J with regard to each β_g,j is 0 and the solution is (Yu et al., 2020):

\begin{array}{l} β_{g, j} = {(λ_{1} I_{(d + 1) * K \times (d + 1) * K} + \sum_{i = 1}^{N_{S}} x_{g, i} {x_{g, i}}^{T})}^{- 1} \sum_{i = 1}^{N_{S}} x_{g, i} y_{i, j} & (29) \end{array}

Experimental Results

The EEG of AD patients implies a large amount of information that cannot be visually expressed from the waveform. Research shows that the visualization algorithm can express the hidden information in the form of images. In order to verify whether the AD brain's electrical features can be represented by WVG, we first select the same channel EEG from an AD patient and a control subject. Two episodes with a length of 500 data points (as shown in Figures 2A,B) are further intercepted, and converted to WVG. The result is shown in Figures 2C,D. Studies have reported that it's easy to detect a diffuse slowing in the EEG of AD patients with the naked eye (Micanovic and Pal, 2014). This diffuse slowing feature is well-preserved in WVG, and WVG of AD patients can be clearly observed in more communities, indicating the feasibility of WVG method for AD detection. For further observation of the topological feature of the WVG network, the two adjacency matrixes are represented as network structure diagrams that shown in Figures 2E,F. The dots in figure represent all network nodes and the network edges are represented by curves, and the shade of the curve color can directly reflect the weighted value of the edges. It can be observed that the different communities in the WVG network of normal people are generally similar in size and the distributions of connections are uniform. The community structure of the networks obtained by the WVG method is more irregular for AD patients. Most nodes are concentrated in a small part of communities, and the connection between communities is also closer. The result indicates that the electrophysiological signals of AD brains are more unstable, with fluctuations that are stronger. Research on single channel reveals that the WVG network of AD and normal people are significantly difference. Next, we will transform all 16 channels into multi-networks ({y_n}(1 ≤ n ≤ 16)) and each layer of network can be obtained from each channel. We further considered which parameters are selected to quantify this difference.

FIGURE 2

Figure 2. An example of converting EEG signal from an AD subject and a control subject into a WVG. EEG signals of FP1 channel from AD (A) subject and control (B) subject with 5s length. The adjacency matrixes of the converted WVGs respectively for AD (C) subject and control (D) subject. Schematic diagram of complex networks corresponding to WVGs of AD (E) subject and control (F) subject.

To reduce the computing time and to retain as much information as possible, the EEG signal is divided into many episodes through sliding windows with lengths of 500 data points. Since the size of the converted WVG network is consistent with the length of EEG series, a series of adjacency matrixes of size 500 × 500 are finally obtained. Next, we calculate clustering coefficient (x₁), graph index complexity (x₂), average weighted degree (x₃), network entropy (x₄), degree distribution index (x₅), modularity (x₆), local efficiency (x₇), and average path length (x₈) of each WVG network of both AD and control. Above parameters can be obtained from each different network layers, which can be considered as different features. Since there is a considerable difference in the magnitude of the values of different parameters, the calculated result is normalized to 0~1. All windows of each person were further averaged, and then a statistical analysis was performed based on each person. As shown in Figure 3, parameters of all subjects are statistically analyzed and the parameters that are significantly different for AD group and control group are marked with ^*. The values of clustering coefficient, local efficiency, and shortest path length of the AD group are significantly lower than that of controls with p < 0.01. Meanwhile, the degree distribution entropy of AD group is higher than that of controls with p < 0.05 while the degree distribution lambda of AD group is lower than that of controls with p < 0.05.The obtained results demonstrate that network topological parameters can be used to detect AD.

FIGURE 3

Figure 3. Network parameters (averaged across subjects) of both AD networks and control networks. Error bars represent standard error across subjects. The degree of significant difference is calculated by Analysis of Variance (ANOVA) across all subjects. ^**A significant correlation (p_c ≤ 0.01 corrected for multiple comparisons across tiles). ^*A trend (p_c ≤ 0.05).

Through statistical analysis, it's obvious that some of the above parameters can clearly distinguish AD from the control group. In order to further verify the effect of these parameters on AD recognition, these parameters will be used as input features of the training fuzzy classifier. In each training process, we randomly select 80% of the original data to form training datasets which can be used for ten-fold cross-validation (10-CV), with 90% (90% × 80%) utilized for model training and 10% (10% × 80%) for constructing a validation set. The above procedure is repeated 10 times to cover the entire training set and finally determine the optimal hyperparameters of the TSK model. The remaining 20% of all data is tested as the testing data with determined hyperparameters. For each different input feature or feature vector, the classification results (accuracy, sensitivity, specificity) are averaged after training for 50 times.

The construction of each WVG network is based on a single time series, so 16 WVGs are obtained from 16-channel EEG used in this paper. These WVG networks contain different electrophysiological information of neurons in different brain regions. However, in the existing studies, the parameters extracted from WVG networks constructed by different brain regions' EEG were usually regarded as the same class of features, so the differences between brain regions were ignored. Therefore, we consider the 16 WVG networks as different networks and combine them into a multi-layer network. In order to verify whether the underlying dynamic information of these network layers are different, the classification is first performed with a single feature as input. Each parameter extracted from each single network layer transformed from different channels is used as the single input feature for model training, and the classification results are shown in Table 1 with optimal classification result is bolded. It can be observed that for the same network parameter extracted from different network layers, the classification results are significantly different. The difference in classification accuracy of the same parameter from different network can even reach 28.39% for average weighted degree ({(x₃, y_k)}, k = 1, ..., 16), indicating that the dynamic information that contained in EEG of different brain regions does have significant differences and parameters of different layers maybe independent from each other. This finding shows that the network characteristics of the multi-network composed of WVG network layers can be used as independent input features for the classifier.

TABLE 1

Table 1. Classification results with each single parameter from single network layer is taken as input feature.

The input feature vector consisting of multiple parameters is used for fuzzy system training. The classification will be performed based on the following three feature sets [as shown in Figure 1(3)]: (1) Single parameter from multi-networks: When ensuring that the classifier input is the same parameter, select different network layers for parameter extraction and combination. (2) Multi-parameters from single network: In the case of one single network layer, different parameters are extracted and selected for combination as a classifier input. (3) Multi-parameters from multi-network: All parameters extracted from all network layers are used as different input features to the classifier. Then for each set, various feature select methods including Correlation-based Feature Selection (CFS) (Guyon et al., 2002), Dependence Guided Unsupervised Feature Selection (DGUFS) (Zhu et al., 2017), Fisher (Gu et al., 2012), Feature Selective Validation (FSV) (Bradley and Mangasarian, 1999), Locality-Constrained Linear Coding Feature Select (LLCFS) (Zeng and Cheung, 2011), and minimum-redundancy maximum-relevance (mRMR) (Peng et al., 2005) are used to sort the features to obtain the feature sequence for each set. According to the obtained feature sequence, select the different number of features in order (i.e., the first one feature, the first two features, the first three features.) to component the input vectors for the TSK model training process. In the feature select process (as shown in Figure 1), the methods of Feature Selection Library (FSLib) are adopted for determining feature input vectors of TSK. All the algorithms are implemented with MATLAB 2016b.

First, case 1 is described as an example, and the structure of TSK is also described in details in following. As the clustering coefficient ({(x₁, y₁₃)}) reached a highest accuracy of 79.96% in Table 1, local efficiency from all network layers ({(x₁, y_k)}, k = 1, ..., 16) is adopted for feature selection and multi-input classification. The orders of the features are obtained by various sorting feature selection algorithms. After the ranking of network parameters, we choose input feature vectors with different lengths as inputs of TSK model and calculate classification results (accuracy, sensitivity, and specificity), respectively. The optimal length of input vectors and classification results are shown in Table 2. It can be observed that with different feature select methods, the length of the feature vectors with the optimal classification result is different. Besides, the sensitivity is higher than the specificity for the feature vectors filtrated by CFS and DGUFS methods, while the others are opposite. It shows that the change of the feature used for training will affect the properties of the trained model. As for the parameter set of clustering coefficients extracted from multiple networks, the Fisher method can be used to achieve the optimal classification result. The classification process with Fisher method are further explored.

TABLE 2

Table 2. Classification results with the set of single parameter from multiple networks is taken as input feature vector.

With the applying of Fisher algorithm, the order of the parameters is obtained as (x₁, y_₁₃), (x₁, y₉), (x₁, y_₁₂), (x₁, y_₃), (x₁, y₁), (x₁, y₂), (x₁, y₁), (x₁, y₆), (x₁, y_₁₀), (x₁, y₈), (x₁, y₅), (x₁, y_₁₅), (x₁, y_₇), (x₁, y_₄), (x₁, y₁₄), (x₁, y_₆). The joint distribution of the first two channels under the ranking is illustrated to verify the effectiveness of the same network parameter of WVG network transformed from different channels as the multi-input for classification. The result is shown in Figure 4A with each point represents a subject. It's obviously that AD subjects display significant differences from controls, which also demonstrate that local efficiencies, respectively, of channel 9 and channel 13 are effective to classify AD and controls. These two parameters can also get the best classification results when multi-network clustering coefficient is taken as single parameter input. However, the optimal parameters obtained by feature selection are not completely consistent with those that are optimal for the classification result when a single parameter is used as input. This indicates that the information of a single brain region cannot be used as a direct feature to distinguish patients with AD, but the implicit information of different brain regions can complement each other. In the above ranking order, five rules TSK classifiers are used with the number of classifier inputs is from 1 to 16 in order, and the final classification results under cross-validation are listed in Figure 4B. As the length of input feature vector increases, the accuracy reaches a maximum of 95.28% at four inputs and then begins to decrease.

FIGURE 4

Figure 4. (A) Joint distribution of clustering coefficient obtained from WVG network transformed from Channel 13 and Channel 9. (B) Classification results when the number of input features is from 1 to 16, which is obtained under single parameter (clustering coefficient) from multi-networks set and ordered through feature selection method.

In this part, the framework of the TSK is also described in details based on the selected optimal combination feature. The input vector x consists of the clustering coefficients of channel 13((x₁, y_₁₃)), channel 9((x₁, y₉)), channel 12((x₁, y_₁₂)), and channel 3((x₁, y_₃)). Membership functions can be linguistically expressed using a fuzzy linguistic description including “very low,” “low,” “medium,” “high,” and “very high.” Each membership function of different features corresponds to different description in ascending order of the values of centers. To provide further explanation, the clustering coefficient of channel 13 is interpreted as an example. We define the gaussian model as a membership function, and each rule will get a set of antecedent parameter (centers, standard variance), respectively, which are (0.3990 0.0031) for Rule 1, (0.3956 0.0030) for Rule 2, (0.4165 0.0032) for Rule 3, (0.4052 0.0031) for Rule 4, and (0.4040 0.0030) for Rule 5. By the permutation of these five centers of each rule, membership functions can be described with fuzzy linguistic description: Rule 1 is “very low,” Rule 2 is “very high,” Rule 3 is “low,” Rule 4 is “medium,” and Rule 5 is “high.” The other four features can also be fuzzy and described similarly. Therefore, with the linguistic expressions and the corresponding linear function the fuzzy rule can be given as follows:

R¹ : IFy₁₃ is very low ∧ y₉ is very low ∧ y₁₂ is medium ∧ y₃ is very low,

\begin{array}{l} THEN f_{1} (x) = \\ [\begin{matrix} 0.4975 - 0.1872 y_{13} + 0.1615 y_{9} + 0.1515 y_{12} + 0.2134 y_{3} \\ - 0.1385 - 0.0959 y_{13} - 0.0738 y_{9} - 0.0514 y_{12} - 0.1147 y_{3} \end{matrix}], \end{array}

R² : IFy₁₃ is very high ∧ y₉ is very high ∧ y₁₂ is very low ∧ y₃ is very high,

\begin{array}{l} THEN f_{2} (x) = \\ [\begin{matrix} - 1.99 e- 4 + 1.59 e- 5 y_{13} - 9.79 e- 5 y_{9} - 2.13 e- 5 y_{12} + 2.03 e- 4 y_{3} \\ 0.0013 + 3.34 e- 4 y_{13} + 4.12 e- 4 y_{9} + 2.65 e- 4 y_{12} + 1.15 e- 4 y_{3} \end{matrix}], \end{array}

R³ : IFy₁₃ is low ∧ y₉ is low ∧ y₁₂ is high ∧ y₃ is low,

\begin{array}{l} THEN f_{3} (x) = \\ [\begin{matrix} 0.2508 - 0.0485 y_{13} + 0.0555 y_{9} + 0.0167 y_{12} + 0.1049 y_{3} \\ - 0.0039 - 0.0195 y_{13} - 0.0095 y_{9} - 0.0572 y_{12} - 0.0341 y_{3} \end{matrix}], \end{array}

R⁴ : IFy₁₃ is medium ∧ y₉ is high ∧ y₁₂ is very high ∧ y₃ is medium,

\begin{array}{l} THEN f_{4} (x) = \\ [\begin{matrix} - 0.0536 - 0.0429 y_{13} - 0.0341 y_{9} - 0.0765 y_{12} - 2.05 e- 4 y_{3} \\ 0.2071 + 0.0872 y_{13} + 0.0759 y_{9} + 0.1244 y_{12} + 0.0450 y_{3} \end{matrix}], \end{array}

R⁵ : IFy₁₃ is high ∧ y₉ is medium ∧ y₁₂ is low ∧ y₃ is high,

\begin{array}{l} THEN f_{5} (x) = \\ [\begin{matrix} 0.0130 + 0.0016 y_{13} - 0.0020 y_{9} + 0.0025 y_{12} + 0.0041 y_{3} \\ 2.47 e- 4 + 0.0026 y_{13} + 0.0017 y_{9} + 0.0011 y_{12} + 4.33 e- 5 y_{3} \end{matrix}] . \end{array}

The fuzzy system that has been learned based on these five rules above, the example with an input of [0.2098 0.2106 0.3585 0.2264] is given to further explain the mechanism of testing process. Inputs of the identification process based on the trained fuzzy system are the network features of an AD patient, and the decision output is the prediction of label vector. The sum of the five calculated rule-based outputs is f = [0.8940 0.00956]^T, then the maximal element in f is set to 1 while others to 0 for handling the decision output. Finally, AD patient can be identified based on the final value of the output y = [1 0]^T.

Next, multi-parameters from single network are also used as input set for the classifier together. The classification results obtained by various feature select methods and the optimal lengths of input feature vectors are shown in Table 3. The parameters selected by FSV method can be used to form the vector to obtain the optimal classification result, and the sorted parameters are further analyzed in detail. The features in order obtained through the FSV algorithm is (x₇, y₁₃), (x₁, y₁₃), (x₂, y₁₃), (x₃, y₁₃), (x₈, y₁₃), (x₅, y₁₃), (x₆, y₁₃), (x₄, y₁₃). Clustering coefficients ((x₇, y₁₃)) and local efficiencies ((x₁, y₁₃)) are chosen to verify the feasibility of the classification, and the image is shown in Figure 5A. It is clear that there is a significant difference between the AD and the control group. The TSK classification is applied to all feature input groups. As shown in Figure 5B, the classification accuracy reaches a maximum value of 93.41% when the first three features are taken as input vector. The optimal combination obtained by the feature sorting method is local efficiency (x₇), clustering coefficient (x₁), and graph complexity index (x₂). The graph complexity index has a low discrimination between AD and the control group, and the TSK models trained with graph complexity index extracted from each network layer as single input have low classification accuracy. However, the image complexity index can supplement the clustering coefficient and local efficiency, indicating that the redundancy between some parameters from same network layer is small, which is of great significance as a feature of model training. Through the above classification results, multi-parameters, and multi-networks can both be applied to the TSK classification, and they are not the same type as input sets for model training.

TABLE 3

Table 3. Classification results with the set of multiple parameters from single network is taken as input feature vector.

FIGURE 5

Figure 5. (A) Joint distribution of clustering coefficient and local efficiency obtained from WVG network obtained from Channel 13. (B) Classification results when the number of input features is from 1 to 8, which is obtained under multi-parameters from single network (y₁₃) set and ordered through feature selection method.

Finally, the multi-parameters from multi-networks are used for training. We further applied different feature select methods on this input set, and find the best feature input vectors, respectively. Figure 6 provides the methods and corresponding classification results. The brain area enclosed by the red line is the frontal lobe, the blue is the temporal lobe, the green is the parietal lobe, and the orange is the occipital lobe. It can be observed that the parameters that are filtered by different methods are more common to be extracted from the network layers of the frontal EEG. This suggests that information in the frontal lobe is more effective in identifying AD patients. Damage to the frontal lobe of the brain, which plays a prominent role in thinking and behavior, can lead to forgetfulness, delayed behavior, and distraction. Meanwhile, signals from other brain regions also play an important role in AD recognition, indicating that AD disease has a global impact on the brain. The best result of multi-parameters from multi-networks set are selected through FSV method, which up to 97.28%. The combination is {(x₇, y₃), (x₃, y₁₃), (x₁, y₁₃), (x₃, y₁₂), (x₄, y₄)}. The accuracy rate with set 3 is improved compared with set 1 and set 2, indicating that it is of certain significance to take multiple parameters extracted from multiple networks as different features.

FIGURE 6

Figure 6. The schematic diagram of channel position with the frontal lobe is marked within red lines, the temporal lobe in blue, the parietal lobe in green, and the occipital lobe in orange. The optimal parameters and corresponding classification results under different feature select methods.

Conclusion and Discussion

This paper proposes a multi-input machine learning method that combines fuzzy classifier and WVG to identify AD patient's EEG. In order to improve the interpretability and recognition accuracy of the model, complex network theory and TSK fuzzy system model is adopted. A WVG network layer is constructed using a single channel EEG. The multi-parameters obtained from multiple networks can be used as independent input features for model training, and the TSK model based on fuzzy rules is used to classify AD EEG with better interpretability. We considered three types of classification input sets: multi-parameters from single network, single parameter from multi-networks, and multi-parameters from multi-networks. These three types of inputs are, respectively, applied as the training set of the learning of the TSK model. The experimental results show that the fuzzy model-based system model can achieve optimal performance with multi-parameters from multi-networks as classification input set, and the accuracy is up to 97.83%. Meanwhile, the optimal input numbers are different for the three types of input sets proposed in this paper. The best input combination is 5 input features in the input set of multi-parameters from multi-networks.

The current clinical techniques of AD identification, mainly including the scale assessing, cerebrospinal fluid examination, and the observation of atrophy of gray matter through the brain functional imaging, are difficult to obtain reliable diagnostic markers. It is also difficult to find obvious organic changes in the early stage of AD. We propose an AD diagnostic model that combines the TSK fuzzy model with complex network obtained by WVG method and propose three different kinds of training input sets, which provides a new method for the search of AD EEG biomarkers. Compared with traditional methods, the AD identification approach proposed in this paper, has lower implementation difficulty and higher accuracy.

EEG, which is commonly considered to have significant chaotic characteristics, cannot be well-evaluated with linear analysis. The WVG method used in this paper can transform the one-dimensional time series into images and extract the underlying information contained in electrophysiological activities of different brain regions. In contrast with other network construction methods like synchronous network, the WVG networks obtained by each EEG channel are independent of each other. Thus, more network features can be found and effective biomarkers can be obtained from kinds of feature sets with WVG (Zhu et al., 2014). The classification results show that this WVG method is very effective for feature extraction of AD recognition. In future works it will be combined with multi-layer network theory, further discussing the correlation between different channels with constructing multi-layer network. In past research we confirmed the feasibility of the multi-layer network scheme, and extracted the multiplex clustering coefficient and multiplex participation coefficient (Cai et al., 2020). Future work will consider both the implicit characteristics of single channels and the information integration between multiple channels.

We propose three different kinds of feature sets and prove that the optimal parameter vectors can be obtained from the set multi-parameters from multi-networks. This finding indicates that simultaneously considering different networks and different parameters as disparate features has obvious help for the acquisition of AD biomarkers. At the same time, the classification results show that the excessive features as input is not conducive to the optimization of the classification model, so it is necessary to reduce the feature dimension. Too much feature increase may lead to the overfitting of the learning model, and even the increase of invalid features may lead to the decrease of the accuracy based on test set (Guyon et al., 2002). Therefore, the application of feature selection plays an important role in improving the accuracy of fuzzy learning models.

In this paper, we combined the identification model combining feature selection approaches with machine learning. Researchers can effectively reduce the number of EEG channels, and the difficulty of data collection will be significantly reduced. Meanwhile, with the reduction of the parameters, it can be easier to improve the efficiency of the AD recognition process. Compared with traditional manual diagnosis, machine learning methods have higher reliability, and improved recognition accuracy. Especially, the TSK method has higher interpretability and robustness by integrating the advantages of fuzzy rules and membership functions. There are still some limitations in our research. We used a variety of feature selection methods, but a feature selection method suitable for the highly interpretable TSK model is necessary to be considered. Future work may focus on how to select features more efficiently and accurately to achieve higher classification accuracy.

Data Availability Statement

The datasets generated for this study are available on request to the corresponding author.

Ethics Statement

The studies involving human participants were reviewed and approved by the Ethics committee of Tangshan Gongren hospital. The patients/participants provided their written informed consent to participate in this study.

Author Contributions

HY: article writing, design of methods, and article correction. LZ: article writing, processing and analysis of data, and design of methods. LC: design of methods and data analysis. JW: data analysis and article review. JL: data collection. RW: article review and correction. ZZ: data collection. All authors contributed to the article and approved the submitted version.

Funding

This work was supported by Tianjin Natural Science Foundation (Grant No. 19JCYBJC18800), Tangshan Science and Technology Project (Grant Nos. 18130208A and 19150205E), Hebei Science and Technology Project (Grant No. 18277773D), and Natural Science Foundation of Tianjin Municipal Science and Technology Commission (Grant No. 13JCZDJC27900).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Ahmadlou, M., Adeli, H., and Adeli, A. (2010). New diagnostic EEG markers of the Alzheimer's disease using visibility graph. J. Neural Transm. 117, 1099–1109. doi: 10.1007/s00702-010-0450-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Bezdek, J. C., Ehrlich, R., and Full, W. (1984). FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10, 191–203. doi: 10.1016/0098-3004(84)90020-7

Identification of Alzheimer's EEG With a WVG Network-Based Fuzzy Learning Approach

Introduction

Materials and Methods

Subjects and EEG Recordings

WVG Methods

Feature Extraction and Select

Clustering Coefficient

Average Weighted Degree

Graph Index Complexity

Degree Distribution Index

Network Entropy

Modularity

Local Efficiency

Average Path Length

TSK Fuzzy Model

Learning Algorithm

Experimental Results

Conclusion and Discussion

Data Availability Statement

Ethics Statement

Author Contributions

Funding

Conflict of Interest

References

94% of researchers rate our articles as excellent or good