Skip to main content

ORIGINAL RESEARCH article

Front. Phys., 03 July 2023
Sec. Interdisciplinary Physics

Cost-sensitive classification algorithm combining the Bayesian algorithm and quantum decision tree

Retracted
Naihua JiNaihua Ji1Rongyi BaoRongyi Bao1Xiaoyi MuXiaoyi Mu1Zhao ChenZhao Chen1Xin YangXin Yang1Shumei Wang
Shumei Wang2*
  • 1School of Information and Control Engineering, Qingdao University of Technology, Qingdao, China
  • 2School of Science, Qingdao University of Technology, Qingdao, China

This study highlights the drawbacks of current quantum classifiers that limit their efficiency and data processing capabilities in big data environments. The paper proposes a global decision tree paradigm to address these issues, focusing on designing a complete quantum decision tree classification algorithm that is accurate and efficient while also considering classification costs. The proposed method integrates the Bayesian algorithm and the quantum decision tree classification algorithm to handle incremental data. The proposed approach generates a suitable decision tree dynamically based on data objects and cost constraints. To handle incremental data, the Bayesian algorithm and quantum decision tree classification algorithm are integrated, and kernel functions obtained from quantum kernel estimation are added to a linear quantum support vector machine to construct a decision tree classifier using decision directed acyclic networks of quantum support vector machine nodes (QKE). The experimental findings demonstrate the effectiveness and adaptability of the suggested quantum classification technique. In terms of classification accuracy, speed, and practical application impact, the proposed classification approach outperforms the competition, with an accuracy difference from conventional classification algorithms being less than 1%. With improved accuracy and reduced expense as the incremental data increases, the efficiency of the suggested algorithm for incremental data classification is comparable to previous quantum classification algorithms. The proposed global decision tree paradigm addresses the critical issues that need to be resolved by quantum classification methods, such as the inability to process incremental data and the failure to take the cost of categorization into account. By integrating the Bayesian algorithm and the quantum decision tree classification algorithm and using QKE, the proposed method achieves high accuracy and efficiency while maintaining high performance when processing incremental sequences and considering classification costs. Overall, the theoretical and experimental findings demonstrate the effectiveness of the suggested quantum classification technique, which offers a promising solution for handling big data classification tasks that require high accuracy and efficiency.

1 Introduction

One important classification technique is the decision tree. However, current methods such as the fuzzy Gaussian decision tree have some drawbacks [1]; the decision tree with variable precision neighborhood similarity[2], the decision tree classification based on sampling scheme[3], and Breiman’s random forest[4] are centered on choosing features that maximize decision tree classification performance.

The performance of specific algorithms has been improved by several useful methods. First, in terms of algorithm design, many researchers have recently presented several evolutionary decision tree induction techniques that combine evolutionary algorithms to enhance the capability of global search when greedy methods fail [58]. Research has been conducted on the rules’ precision and interpretability, nevertheless. The PrismSTC algorithm was suggested by Han et al. [9] to learn rule sets for a single target class. The PrismSTC algorithm can provide more straightforward rule-based classifiers without sacrificing accuracy when compared to Prism, Iterative bisection 3 (ID3), and C4.5 methods. By reducing the number of rules and conditions, Cano et al [10] suggested classification rule mining technique attempts to make IF–THEN classification rules more understandable and interpretable. Yang et al. [11] used discretization techniques to handle numerical attributes, balance functions to counteract the increase in information gain to overcome the multi-value bias problem, and the improved ID3 (IID3) algorithm in an effort to provide accurate, dependable, and quick disease prediction. A three-stage multi-criteria classification framework for spare parts management was proposed by Hu et al. [12], which employed a rough set method based on dominance to derive “IF–THEN” classification rules. Second, for classification problems involving a high number of classes, Laber et al. [13] developed an effective splitting criterion with theoretical approximation guarantees that can handle multivalued nominal properties. For the purpose of classifying stream data, Mahan et al. [14] developed a split feature selection technique based on the chi-square criterion.

These techniques produce great performance but disregard the cost of attributes. It is vital to remember that the cost of acquisition might vary greatly and play a significant role in various circumstances. For instance, classification based on decision trees can be used to aid in diagnosis in individual medical services [15]. A range of medical signs can be used for supplementary diagnosis. These measurements’ costs are extremely different from one another. For instance, the cost of body heat is essentially nonexistent, while PAT-CT is significantly more expensive. These signs’ diagnostic abilities also vary. Some people favor providing cold diagnoses that do not involve high acquisition fees. In order to attain high diagnostic accuracy and obtain them within a fair cost range, it is a challenge to choose the right diagnostic signs. The majority of academics, on the other hand, have created a variety of strategies in recent years to solve the decision tree algorithm’s drawbacks and combine it with other algorithms to enhance the spatial complexity of traditional algorithms while decreasing their temporal complexity. Nevertheless, in the case of increasing time series data, there is no suitable technique to improve the decision tree algorithm. The use of data mining [16,17] technology is becoming more widespread. Time series [18,19], which also contains speech and financial data, is one type of typical data. However, time series data frequently increase with time, which reduces the efficiency of the traditional decision tree method to categorize, perform regression analysis on, and forecast this kind of data.

This study leverages the incremental learning feature of the Bayesian algorithm and the UCR time series dataset for simulation testing to solve the drawbacks of typical decision tree strategies for processing incremental time series data. The experimental findings demonstrate the strong application impact of the suggested incremental decision tree algorithm on incremental data and its strong practical performance.

Contrarily, quantum information processing [20]has advanced significantly in recent years. A natural generalization of classical information is quantum information [2127]. It is the most precise and comprehensive quantum mechanical account in the world. The quantum version of the classical algorithm, however, differs significantly from the classical algorithm due to its distinct characteristics. Combining quantum computing with artificial intelligence or machine learning would be incredibly exciting (AI). Recently, there has been increased interest in quantum machine learning, which examines its classical equivalent in quantum systems [2831]. Quantum classification algorithms have been applied in many fields, such as the quantum K-nearest neighbors algorithm for image classification [32]. The quantum classification algorithm, which combines hybrid quantum and classical classification algorithms, is used to deal with image classification [33]. Due to the shortcomings of conventional decision tree algorithms for incremental time series data processing, we aim to design a complete quantum decision tree classification algorithm that has high accuracy and efficiency while still maintaining high performance when processing incremental sequences and considering classification costs. This paper combines the quantum decision tree algorithm, which considers the cost and can process time series data efficiently while considering the cost, with the incremental learning property of the Bayesian algorithm. Simulation experiments are conducted on UCR time series datasets. The experimental results demonstrate both the significant application influence of the proposed quantum decision tree method on incremental data and its high practical performance.

2 Related work

2.1 Decision tree

A decision tree classifier is a tree model, which is a standard method for classification tasks. One of the most often used algorithms in various data mining methods is the decision tree [3437] approach. The top decision node of the decision tree classifier, conceptualized as a directed acyclic graph, is often constructed first before being divided into several branches based on various criteria, as shown in Figure 1. The leaf nodes at the end of the digraph then indicate the categorized decision. Our objective is to construct decision trees from a training dataset using multiple features and labels. After construction, when given a new, unlabeled sample, we can trace its path from the top node to the leaf node. At each branch point, a specific path is chosen based on whether the sample fulfills the feature requirements. Then, the label of the leaf node is applied to the unlabeled sample.

FIGURE 1
www.frontiersin.org

FIGURE 1. Decision tree classifier diagram. When presented with a fresh input, it will begin at the top node and descend in accordance with its traits until it reaches a leaf node. After that, it gives that input the leaf node’s label.

Fundamental concepts and algorithms of decision trees, including the CART algorithm, ID3 algorithm, and C4.5 algorithm, have already been discussed in brief [38]. The decision tree approach has been widely used in various categorization scenarios, such as data mining, due to its respectable efficiency and interpretability.

During the creation of a decision tree, the training data are iteratively input into the tree. The characteristics that are selected depend on the information gain and entropy. Information entropy can be used to measure the uncertainty of random variables. To calculate the information entropy of a random variable, one can use the probability of the random variable as shown in the following equation:

PX=xi=pi,i=1,2,,nHX=i=1npilogpiHXY=i=1npiHXY=yi,(1)

where X is the random variable, pi is its probability, HX is its information entropy, and HXY is its information entropy under certain circumstances. The formulation makes it apparent that the uncertainty of the random variable rises as information entropy grows. When we build the decision tree, we set the current decision tree to D. The effect of a feature on the information entropy, also known as information gain, may be represented as the difference between the entropy HD of the current decision tree and the conditional entropy A of the decision tree D under the condition of the feature A. This is accurate when the new feature A minimizes the decision tree’s information entropy. The following equation is given:

gD,A=HDHDA.(2)

2.2 Bayesian algorithm

The Bayesian classification method is a quick, effective, and useful statistical classification technique that builds the classification process using Bayesian theory. Figure 2 displays the Naive Bayes model. The prior and posterior probabilities are first calculated using the Bayesian approach:

PDA=PADPDPA,(3)

where P (D) stands for the prior probability, P (AD) stands for the conditional probability that A can observe when condition D is satisfied, and P (DA) stands for the posterior probability that D is taken into account when condition A is met. According to the Bayesian probability theory, the posterior probability P (DA) fluctuates when the prior probability and conditional probability change. Depending on whether variables or conditions A and D are seen as independent data or functionally dependent data, the prior probability and condition are used. The categorization of the data is completed by the probability, which predicts the posterior probability.

FIGURE 2
www.frontiersin.org

FIGURE 2. Simple naive Bayes algorithm classification example. At this point, naive Bayes classification is relatively simple, so in multi-feature and multi-classification, Bayes effort can no longer be used and Gaussian Bayes should be considered.

The previously mentioned Bayesian classification methods are suitable for discrete random variables or discrete data feature sets. The posterior probability can be calculated using the continuous function of the Gaussian distribution by assuming that a continuous random variable or collection of data features follows the Gaussian distribution:

PDA=gD,μA,σA=12πσADμA22σA2.(4)

Among them, the continuous Gaussian function gD,μA,σA is one of them. The mean μA and variance σA can be used to compute the feature set’s Bayesian categorization.

2.3 Quantum classification algorithm

Recently, the development of quantum computing has become increasingly mature, and it is exciting to combine quantum algorithms with traditional machine learning. This harnesses the advantages of quantum computing to improve the efficiency and performance of traditional machine learning algorithms. For example, combining quantum generative adversarial networks with classical generative adversarial networks can enhance image generation algorithms [39].

There are two contenders for supervising QML methods in the near future: quantum neural networks (QNNs) and quantum kernel methods. QNNs use parametric quantum circuits to embed data into a quantum feature space and then train circuit parameters to minimize observable loss functions. However, the variational algorithm used by QNNs faces difficulties in optimization due to the barren plateau problem, making it challenging to train models. On the other hand, quantum kernel methods propose a non-parametric approach where quantum-embedded data points’ inner product is estimated only on a quantum device through quantum kernel estimation (QKE). Quantum SVM (QSVM), the most widely used quantum kernel method, uses QKE to build a support vector machine (SVM) model. Unlike previous works that used Grover’s algorithm with a specific input state, the proposed algorithm in this study utilizes weak classifiers called quantum decision trees (QDTs). To counteract the expressivity of the quantum model, a random low-rank Nyström approximation is performed on the kernel matrix provided to the SVM, limiting overfitting. This approximation also reduces the complexity of circuit sampling.

3 Construction of a cost-sensitive decision tree classification algorithm

3.1 Quantum decision tree

The classification technique introduced in this paper is the quantum decision tree, which has the same digraph structure as the binary decision tree shown in Figure 3. Each vertex, also known as a node, can be either a split node or a leaf node, as indicated by the colors. While the leaf node selects the classification output of the QDT, the split node divides the inputted data point into two halves and descends. Effective segmentation is evaluated by a drop in entropy, commonly referred to as information gain (IG). Given a labeled dataset S that has been divided into component regions SL and SR using the segmentation function,

IGS;SL,SR=HSiL,RSiSHSi.(5)

FIGURE 3
www.frontiersin.org

FIGURE 3. This tree structure, which has nodes determined by splitting functions, is itself a digraph structure. The splitting function is a support vector machine that includes a quantum kernel Nyström approximation.

Naturally, IG will rise as splits more sharply distinguish instances of various classes. When more than two categories are involved and precision-based effectiveness measurements are ineffective, it is better to leverage information gains.

The QDT divides the tree leaves based on the training set for the partition at the root node and then forecasts the class distribution based on the proportion of remaining data points that belong to each class. Mathematically, we establish its prediction as a probability distribution by training the leaf l with a subset of data from point Sl,

lSl;c=1Slx,ySly=c.(6)

The Iverson parenthesis [p] returns 1 if the specified proposition p is true and 0 if it is false. During model training, a node is classified as a leaf under any of the following circumstances: if the node is given only one class of training data, additional segmentation is not required. Either the node is located at the maximum depth d of the tree, as chosen by the user, or the number of data points used for segmentation is fewer than the user-defined value. After training, predictions can be produced by tracing an instance through the tree until it reaches a leaf node. The specific path from root to the leaf being evaluated determines the prediction of QDT. Currently, the probability distribution defined in the training equation and provided in the leaf (6) is used for prediction.

Performance of the split node is the QDT model’s important component. The split function Nθl:S1,+1, where the split function’s hyperparameter is passed by the Lth node. It can be manually chosen prior to the model’s training phase, or the method can be changed while training. Partitions S=x,yS|Nθl(x)=1 and S+=x,yS|Nθl(x)=1 make up the instance. The splitting function’s information gain is now called IGS|Nθl:=IGS;S,S+, with each node’s goal being to maximize IGS|Nθ. In essence, we want the tree to be able to distinguish between points so that instances of various classes arrive at different leaf nodes. It is intended that test instances using a particular node’s unique evaluation path will most likely resemble training examples that took the same route. As a result, a good segmentation function emphasizes how the classes under the tree are divided. The splitting algorithm must still be generalized, however, because repeated splitting on deep trees can quickly result in overfitting.

We propose designating a support vector machine with a quantum kernel as the partition function. The primary objective is to generate distinct hyperplanes in higher dimensional quantum feature spaces that can effectively distinguish between examples of different classes. However, this method is complex and can lead to overfitting. To reduce its effectiveness without limiting the quantum advantage potential, we use an approximation technique that will be explained in detail in Section 3.3.

3.2 Considering cost sensitivity

The cost of a decision tree is represented by cost(T), which is the total of the greatest attribute acquisition costs from root to leaf. Cost, as defined by the T categorization, is the most that can be spent to get the desired characteristic. The optimal decision tree derived from the attribute set S is indicated by the notation Acc(S). The objective is to find a judgment given a training set S, with each tuple having the attribute set. A tree model such that given a cost Cost, a decision tree is created from the model with attribute set SR such that costTSRCost and costTSCost, AccSAccSR.

CGRTi=costTiIGRTi.(7)

We assume that the sample size is not less than the pre-threshold and that the class’s IGR is not equal to zero as we recurse with additional conditional characteristics and compare the values of CGRTi. Finally, we use a gregarious. The technique chooses the characteristic with the lowest cost, also known as the minimum value to generate a split attribute, for each percentage of the information gain ratio.

A branch is created from each value for each discrete attribute. We search for the optimal split point for each continuous attribute. The algorithm is used to determine the appropriate split point for continuous characteristics and IGRTi. To determine whether IGR is equal to 0, we first need to calculate CGR. If the value of CGR is 0, then the attribute cannot be classified and can be removed. Attributes with lower CGR values are more cost effective and efficient in classification. When using CGR, it is more likely that attributes with low cost and good classification ability will be selected during the decision tree construction process. In particular, attributes with zero cost have a CGR of 0 and can be directly selected. To calculate the information gain ratio, we need to know the cost as determined by the application. This is defined using entropy, which is commonly used in decision trees.

EntropyCost,S=i=1mpilogpi.(8)

It is assumed that there are m classes, C attributes, and pi, the ratio of i class instances to S, the total sample size. In order to determine the information gain ratio, which is the gain of the attribute in the dataset S to the attribute’s SplitInfo, we must first determine the attribute. The difference between the entropy of the class and the attribute, where the attribute’s gain is defined as vValueStst,vstEntropytv, and its split information is SplitInfot,S=vSSt,vStlogSt,vSt. Thus, the gain of property t is defined as follows using S:

Gaint,s=EntropyCost,SvValueStSt,vStEntropytv.(9)

The value (St) among them denotes the total number of samples in the dataset S for which the attribute t = v, as well as all of the values of the attribute t in the dataset S. As a consequence, before we can get the gain of the characteristic, we need to get the information gain ratio, which is represented by the term GainRatio (T,S). For an element t in the dataset S, GainRatiot,S=Gaint,SsplitInfot,S.

The algorithm selects attributes in a greedy manner based on the CGR of all attributes. Algorithm 1 shows the pseudocode for building the tree recursively using this algorithm.

Algorithm 1.

Input: training dataset S, attributes T, class attribute C

Output: the decision tree Tree

 1: if S is null then return failure

 2: end if

 3: if T is null then return the node in S with the most Ci class labels

 4:  if Sthreshold then return a leaf node with the Ci class label that appears the most often in S

 5:  end if

 6: end if

 7: while vvalue (t, S) do

 8:  MinGainRaio = 100, node=null, Info (t,S)=0, SplitInfo (t,S)=0

 9:  Calculate Entropy (C,S)

 10:  while vvalue (t, S) do

 11:   set the subset of S with attribute t=v to be St,v

 12:   Infot,S=st,vstEntropytv

 13:   SplitInfot,S=St,vStlogSt,vSt

 14:   Calculate GainRatio (t,S)

 15:   if CostRatio (t,S)≤MinGrainRatio then

 16:    MinGainRatio = CostGrainRatio (t,S); node=t

 17:    attach node to the decision tree;

 18:   end if

 19:  end while

 20: end while

3.3 Nyström quantum kernel estimation

In this part, we embed quantum nuclei in the nodes. By using SVM as a nucleus, the suggested splitting function creates separation hyperplanes. We use the Nyström approximation approach to calculate the kernel matrix (also known as the Gram matrix) using the quantum kernel estimation component [40,41]. The combined process is denoted as Nyström quantum kernel estimation (NQKE), where the parameters determine the basic operation of the split node.

Given dataset Si=xj,yjj=1Ni to the ith segment node, we can assume SLi=xj,yjj=1L without compromising generality. Calculating the inner product between components using the quantum kernel defined by the inner product of a parameterized density matrix Si and SLi,

kΦx,x=TrρΦxρΦx.(10)

The quantum characteristic graph is determined by the spectrum’s breakdown ρΦx=jλjΦjxΦjx, which has a parameterized pure state Φjxj. In reality, ρΦ is stripping the trajectory down to its most basic form. By calculating the probability of 0 on the sample state of UxUx|0, kx,x=ΦxΦx2=0UxUx02 enables the kernel to be estimated.

Positive semidefinite matrices can be checked for matrix completeness using the Nyström method. We define N × L matrix G≔[W,B], where Gij=kxi,xj, using the column subset determined of any node (omitting node labels for brevity), and we approximate KGW−1G to complete the N × N kernel matrix. We have the following expansion:

KK̂:=WBBBW1B.(11)

Thus, in general, the matrix W is the inverse of the Moore–Penrose lattice. This is significant when W−1 is a simple integer. According to intuition, the Nyström method uses the correlation between the sample columns of the kernel matrix to approximate the full matrix with a low rank. When the underlying matrix K approaches full rank, the approximation alters as a result. The manifold hypothesis suggests that selecting option L ≪ N is not completely meaningless, despite the fact that the data often do not explore all degrees of freedom and frequently reside on sub-manifolds where lower-order approximation is acceptable.

The kernel matrix’s positive semi-determinism, K̂0, is unaltered by the Nyström approximation, and the SVM optimization problem is left unsolved. In addition, this is the outcome of the well-known representation theorem, which is induced by the modified nuclear reproducing kernel Hilbert space (RKHS) HΦ=fΦfΦ()=iNαik̂Φ,xi. Solving the SVM quadratic program generates a certain set of αii=1N for a given dataset. Hence, a split function is created.

NΦ;αx=signi=1Nαiỹik̂Φx,xi.(12)

Since αi=αiỹi,αi0 and ỹi=Fyi, functionF:C1,1 often transforms a collection of numerous classes, notably C>2, into a binary class issue by mapping the original class labels, yiC. We offer a numerical comparison of the even split (ES) and one-to-one (OAA) definitions of the function mathrmF.

The split function is constructed using random node optimization (RNO) to ensure that the correlation between quantum decision trees is minimized. Since the best hyperplane varies depending on which subset is used, the selection of landmark data points adds unpredictability. In addition, it was possible to change the hyperparameters L and Φ both up and down the tree. Figure 1 illustrates this for a tree of depth i = 1, …, D − 1, with the split function Φi,Li represented by a distinct tuple. The unique Hilbert space of the function is implied by the particular kernel defined by the embedding Φ, where kΦ is the regenerative kernel.

3.4 Bayesian algorithm combined with decision tree algorithm

When dealing with large amounts of data, particularly when working with time series data that are received in fragments, the implementation of quantum classification techniques may face significant limitations. To overcome these restrictions, an incremental learning approach must be employed. The decision tree technique used in this study employs a top–down recursive building strategy and is a type of classification algorithm that relies on data properties. Categorization outcomes are determined through a downward comparison of leaf nodes on the decision tree, while attribute values are compared using branch comparisons. To address issues related to incremental learning, we combine the quantum decision tree method with the Bayesian algorithm.

The incremental decision tree technique first employs our own quantum decision tree algorithm to classify samples, before segmenting multiple small samples into groups based on attribute values in a recursive manner. After creating the decision tree nodes, the Bayesian nodes and regular leaf nodes are generated using real-world scenarios.

Once incremental samples have been separated into the current node, the data cannot be further subdivided, and in the worst-case scenario, all incremental samples may have the same spatial discrete attributes. In such cases, the incremental decision tree method creates a Bayesian node and applies the Bayesian posterior probability algorithm to complete the categorization of the most valuable attributes of the incremental data. In this work, the incremental decision tree method is divided into two stages. In the first stage, the initial quantum decision tree is constructed using the sample training data. The second stage involves using Bayesian nodes for incremental learning. Every time a new training sample is obtained, the incremental decision tree approach continuously compares the incremental data with the features in the existing decision tree until it reaches a leaf node of the decision tree.

If the node is not a Bayesian node, the accuracy of the partition should be verified. If the conclusion of the decision tree is correct, we leave it as it is; otherwise, both Bayesian analysis and decision tree analysis must be compared to categorize the incremental data. If the Bayesian classification method’s accuracy is better than that of the decision tree classification strategy, we convert the node into a Bayesian node. In the case of Bayesian nodes, the Bayesian parameters must be updated with the incremental data. Building incremental decision trees using continuous recursion, transforming leaf nodes into Bayesian nodes, or updating node parameters can be used to complete incremental learning and overcome incremental data issues. Figure 4 depicts the typical decision tree algorithm flow. When Bayesian classification is performed on the nodes of the decision tree in order to predict node values, the classification properties of the decision tree are made more intelligible, and its classification effect is enhanced, as can be seen from the algorithm’s flow.

FIGURE 4
www.frontiersin.org

FIGURE 4. Construction process of an incremental decision tree algorithm.

4 Simulation experiment

4.1 Datasets and the experimental environment

In this part, we carry out in-depth tests to assess how well our algorithm works. For the experiment, we used eight UCR time series datasets. We contrast the suggested approach with CART algorithms, a Grover search-based quantum learning system (GBLS) [42], and quantum SVMs. We use principle component analysis (PCA) in the first half of this section to alter the dimension of a set of data that is contained in a certain number of qubits. Next is a step of re-labeling and normalization xi0,π. We may determine accuracy by dividing the total number of records in the test set by the number of records that were correctly classified. We used the attribute costs in the dataset, which are described in more detail in Table 1. We will compare the accuracy of the proposed algorithm in categorizing datasets with traditional methods such as CART, GBLS, and QSVM. Additionally, we will analyze the performance of the algorithm in terms of classification accuracy, efficiency, and cost considerations.

TABLE 1
www.frontiersin.org

TABLE 1. Dataset information.

4.2 Experimental results and analysis

4.2.1 Accuracy comparison of different algorithms in different datasets

Based on our tests, the results (shown in Figure 5) indicate that our algorithm performs similarly to traditional CART classification algorithms in terms of classification accuracy. Furthermore, when compared with other quantum algorithms, such as quantum support vector machines and GBLS, our algorithm exhibits certain advantages. From these findings, we can conclude that the proposed quantum decision tree classification algorithm is among the most reliable and stable of all existing quantum classification algorithms.

FIGURE 5
www.frontiersin.org

FIGURE 5. Classification accuracy of this algorithm is compared with the QSVM algorithm, GBLS algorithm, and CART algorithm in different datasets.

4.2.2 The advantages of dealing with incremental sequences

In this section, we present eight datasets (Figure 6) and evaluate the performance of the proposed quantum decision tree method with Bayesian nodes against traditional GBLS, QSVM, and CART decision tree algorithms. Each method was cross-verified for the eight datasets, and the average accuracy was calculated. The accuracy comparison of classifications in incremental learning is shown in Figure 6. The simulation experiment’s data analysis indicates that the suggested algorithm has a high probability of success in categorizing time series data. In the incremental data classification mining of the eight data samples, the algorithm demonstrated an average improvement of 0.8% and 1.3% when compared to the GBLS algorithm and QSVM, respectively. Additionally, the proposed method offers several advantages over conventional CART algorithms as the incremental data accumulate. The performance of the suggested method improves to some extent. During the real application process, each node may be evaluated by the Bayesian node machine learning model resulting in more trustworthy and dependable data classification results due to the clear improvement in classification outcomes.

FIGURE 6
www.frontiersin.org

FIGURE 6. Comparison of incremental learning classification accuracy scores from several approaches.

4.2.3 Advantage of cost

This section compares the performance of the proposed method with that of GBLS and CART algorithms in a cost-effective environment. The experiment’s results are presented in Figure 7.

FIGURE 7
www.frontiersin.org

FIGURE 7. Considering the cost, two datasets, ECG200 and Strawberry, are selected to compare the performance of the CART algorithm, GBLS, and the algorithm in this paper. (A) ECG200. (B) Strawberry.

According to Figure 7B, the cost of acquiring the attributes used by the CART algorithm is 26.2, and the cost of acquiring the attributes used by the GBLS is 21.8. In contrast, the cost of acquiring the attributes used by our method is less than 15 at most, which significantly lowers the cost of attributes when compared to the CART and GBLS algorithms. The figure shows that when the cost is decreased, our technique’s accuracy improves. When the cost is low, our method outperforms the CART and the GBLS algorithms in terms of both cost and accuracy. The difference in accuracy between our algorithm and the CART algorithm, when the cost is high, is less than 1%, but the cost of the CART and the GBLS algorithms is considerably more than the cost of our algorithm, and occasionally, we can even achieve superior accuracy. Our technique is more accurate than the CART and the GBLS algorithms at a very low cost. Figure 7A shows that as data volume increases, the difference in accuracy between our algorithm, the CART algorithm, and the GBLS algorithm does not exceed 2%. However, the cost of the CART algorithm is 512, while the cost of the GBLS algorithm is 498, which is almost twice the maximum cost of our algorithm. Figure 7 shows how, as the amount of data increases, our method’s accuracy rises while becoming less expensive. The average accuracy difference between our algorithm and the CART algorithm is around 0.25%, while the average accuracy difference between our algorithm and the GBLS method is about 0.46% (our algorithm is better). Nevertheless, the price of the GBLS and CART algorithms is significantly higher.

4.2.4 Efficiency of classification

We compared the effectiveness of four methods in six datasets. Table 2 displays the temporal complexity comparison of the three methods used for categorizing the six datasets. The simulation experiment’s data analysis shows that the proposed method has a comparable time complexity compared to other algorithms while achieving similar classification performance for time series data.

TABLE 2
www.frontiersin.org

TABLE 2. Comparison of the time complexity of three algorithms’ classification.

5 Conclusion

A global decision tree served as the paradigm for this research. The model dynamically constructs an appropriate decision tree depending on input items and cost constraints. In this work, we merge the Bayesian approach and a quantum decision tree classification algorithm to handle incremental sequences and larger quantities of data. By adding kernel functions obtained from quantum kernel estimation to a linear quantum support vector machine, we construct a decision tree classifier employing decision-directed acyclic networks of QSVM nodes (QKE). By comparing the simulated experiment results in Section 4 to the actual results, we demonstrate that the recommended strategy works very well. For processing incremental data, the proposed technique outperforms the traditional CART decision tree algorithm, GBLS algorithm, and QSVM in terms of accuracy. The recommended technique’s accuracy in cost-aware classification is scarcely different from that of the standard algorithm, and it performs even better in cases where the cost is minimal.

Therefore, the quantum decision tree classification algorithm based on the Bayesian algorithm proposed in this paper makes up for the disadvantages of existing quantum classification algorithms in dealing with incremental sequences and the need to consider costs. In our experiments in Section 4, the proposed algorithm performs well in different datasets and incremental datasets, and the classification effect is better than the CART algorithm, GBLS algorithm, and QSVM algorithm. Moreover, after adding the cost calculation, the classification effect of the proposed algorithm is better than other algorithms when the cost is lower. The algorithm proposed in this paper can effectively process incremental data and consider the cost while ensuring its classification accuracy is higher than other classification algorithms.

Data availability statement

The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

Author contributions

NJ: conceptualization, methodology, and software; RB: data curation and writing the original draft; ZC and XM: visualization, investigation, and supervision; RB: writing, reviewing, and editing; XY: funding acquisition and resources; and SW: conceptualization, funding acquisition, and resources. All authors contributed to the article and approved the submitted version.

Funding

This project was supported by the National Natural Science Foundation of China (Grant No. 42201506), the Natural Science Foundation of Shandong Province, China (Grant No. ZR2021MF049), and the Joint Fund of Natural Science Foundation of Shandong Province (Grant Nos. ZR2022LLZ012 and ZR2021LLZ001).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Pałczyński K, Czyżewska M, Talaśka T Fuzzy Gaussian decision tree. J Comput Appl Maths (2023) 425:115038. doi:10.1016/j.cam.2022.115038

CrossRef Full Text | Google Scholar

2. Liu C, Lin B, Lai J, Miao D An improved decision tree algorithm based on variable precision neighborhood similarity. Inf Sci (2022) 615:152–66. doi:10.1016/j.ins.2022.10.043

CrossRef Full Text | Google Scholar

3. Jin C, Li F, Ma S, Wang Y Sampling scheme-based classification rule mining method using decision tree in big data environment. Knowledge-Based Syst (2022) 244:108522. doi:10.1016/j.knosys.2022.108522

CrossRef Full Text | Google Scholar

4. Abd Algani YM, Ritonga M, Kiran Bala B, Al Ansari MS, Badr M, Taloba AI Removed: Machine learning in health condition check-up: An approach using Breiman's random forest algorithm. Meas Sensors (2022) 23:100406. doi:10.1016/j.measen.2022.100406

CrossRef Full Text | Google Scholar

5. Barros RC, Basgalupp MP, de Carvalho ACPLF, Freitas AA A survey of evolutionary algorithms for decision-tree induction. IEEE Trans Syst Man, Cybernetics, C (Applications Reviews) (2012) 42:291–312. doi:10.1109/tsmcc.2011.2157494

CrossRef Full Text | Google Scholar

6. Basgalupp MP, Barros RC, de Carvalho AC, Freitas AA Evolving decision trees with beam search-based initialization and lexicographic multi-objective evaluation. Inf Sci (2014) 258:160–81. doi:10.1016/j.ins.2013.07.025

CrossRef Full Text | Google Scholar

7. Kappelhof N, Ramos L, Kappelhof M, van Os H, Chalos V, van Kranendonk K, et al. Evolutionary algorithms and decision trees for predicting poor outcome after endovascular treatment for acute ischemic stroke. Comput Biol Med (2021) 133:104414. doi:10.1016/j.compbiomed.2021.104414

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Lien L-C, Dolgorsuren U Rule-based knowledge discovery of satellite imagery using evolutionary classification tree. J Parallel Distributed Comput (2021) 147:132–9. doi:10.1016/j.jpdc.2020.09.003

CrossRef Full Text | Google Scholar

9. Liu H, Haig E Granular computing based approach of rule learning for binary classification. Granular Computing, 4 (2019). p. 275–83. doi:10.1007/s41066-018-0097-2

CrossRef Full Text | Google Scholar

10. Cano A, Zafra A, Ventura S An interpretable classification rule mining algorithm. Inf Sci (2013) 240:1–20. doi:10.1016/j.ins.2013.03.038

CrossRef Full Text | Google Scholar

11. Yang S, Guo J-Z, Jin J-W An improved id3 algorithm for medical data classification. Comput Electr Eng (2018) 65:474–87. doi:10.1016/j.compeleceng.2017.08.005

CrossRef Full Text | Google Scholar

12. Hu Q, Chakhar S, Siraj S, Labib A Spare parts classification in industrial manufacturing using the dominance-based rough set approach. Eur J Oper Res (2017) 262:1136–63. doi:10.1016/j.ejor.2017.04.040

CrossRef Full Text | Google Scholar

13. Laber ES, de A. Mello Pereira F Splitting criteria for classification problems with multi-valued attributes and large number of classes. Pattern Recognition Lett (2018) 111:58–63. doi:10.1016/j.patrec.2018.04.013

CrossRef Full Text | Google Scholar

14. Mahan F, Mohammadzad M, Rozekhani SM, Pedrycz W Chi-mflexdt:chi-square-based multi flexible fuzzy decision tree for data stream classification. Appl Soft Comput (2021) 105:107301. doi:10.1016/j.asoc.2021.107301

CrossRef Full Text | Google Scholar

15. Alex S, Dhanaraj KJ, Deepthi PP Private and energy-efficient decision tree-based disease detection for resource-constrained medical users in mobile healthcare network. IEEE ACCESS (2022) 10:17098–112. doi:10.1109/access.2022.3149771

CrossRef Full Text | Google Scholar

16. Li J, Chen A A local discrete text data mining method in high-dimensional data space. INTERNATIONAL JOURNAL COMPUTATIONAL INTELLIGENCE SYSTEMS (2022) 15:53. doi:10.1007/s44196-022-00109-1

CrossRef Full Text | Google Scholar

17. Kumar S, Mohbey KK A review on big data based parallel and distributed approaches of pattern mining. JOURNAL KING SAUD UNIVERSITY-COMPUTER INFORMATION SCIENCES (2022) 34:1639–62. doi:10.1016/j.jksuci.2019.09.006

CrossRef Full Text | Google Scholar

18. Xue B, Pechenizkiy M, Koh Y Data mining on extremely long time-series. 21ST IEEE INTERNATIONAL CONFERENCE DATA MINING WORKSHOPS ICDMW 2021 (2021) 1057–66.

Google Scholar

19. Jiang W, Zhang D, Ling L, Lin R Time series classification based on image transformation using feature fusion strategy. NEURAL PROCESSING LETTERS (2022) 54:3727–48. doi:10.1007/s11063-022-10783-z

CrossRef Full Text | Google Scholar

20. Ye T-Y, Xu T-J, Geng M-J, Chen Y (2022). Two-party secure semiquantum summation against the collective-dephasing noise. arXiv:2205.08318. doi:10.48550/arXiv.2205.08318

CrossRef Full Text | Google Scholar

21. Wang H, Xue Y, Qu Y, Mu X, Ma H Multidimensional bose quantum error correction based on neural network decoder. npj Quan Inf (2022) 8:134. doi:10.1038/s41534-022-00650-z

CrossRef Full Text | Google Scholar

22. Ding L, Wang H, Wang Y, Wang S (2022). Based on quantum topological stabilizer color code morphism neural network decoder. Quan Eng 2022, 1–8. doi:10.1155/2022/9638108

CrossRef Full Text | Google Scholar

23. Song J, Ke Z, Zhang W, Ma Y, Ma H Quantum confidentiality query protocol based on bell state identity. Int J Theor Phys (2022) 61, 52. doi:10.1007/s10773-022-05032-x

CrossRef Full Text | Google Scholar

24. Zhao J, Zhang T, Jiang J, Fang T, Ma H Color image encryption scheme based on alternate quantum walk and controlled rubik’s cube. Scientific Rep (2022) 12:14253. doi:10.1038/s41598-022-18079-x

CrossRef Full Text | Google Scholar

25. Ke Z-H, Ma Y-L, Ding L, Song J-B, Ma H Controlled dense coding using generalized ghz-type state in a noisy network. Int J Theor Phys (2022) 61:171. doi:10.1007/s10773-022-05069-y

CrossRef Full Text | Google Scholar

26. Jiang Y, Chu P, Ma Y, Ma H Search algorithm based on permutation group by quantum walk on hypergraphes. Chin J Electron (2022) 31:626–34. doi:10.1049/cje.2021.00.125

CrossRef Full Text | Google Scholar

27. Zhao W, Wang Y, Qu Y, Ma H, Wang S Binary classification quantum neural network model based on optimized grover algorithm. Entropy (2022) 24:1783. doi:10.3390/e24121783

PubMed Abstract | CrossRef Full Text | Google Scholar

28. Lamata L Quantum reinforcement learning with quantum photonics. PHOTONICS (2021) 8:33. doi:10.3390/photonics8020033

CrossRef Full Text | Google Scholar

29. Houssein EH, Abohashima Z, Elhoseny M, Mohamed WM Machine learning in the quantum realm: The state-of-the-art, challenges, and future vision. EXPERT SYSTEMS APPLICATIONS (2022) 194:116512. doi:10.1016/j.eswa.2022.116512

CrossRef Full Text | Google Scholar

30. Tancara D, Dinani HT, Norambuena A, Fanchini FF, Coto R Kernel-based quantum regressor models learning non-markovianity. PHYSICAL REVIEW A (2023) 107:022402–219. doi:10.1103/physreva.107.022402

CrossRef Full Text | Google Scholar

31. Neumann NMP, de Heer PBUL, Chiscop I, Phillipson F Multi-agent reinforcement learning using simulated quantum annealing. COMPUTATIONAL SCIENCE - ICCS 2020, PT VI (SPRINGER INTERNATIONAL PUBLISHING AG) (2020) 12142:562–75. doi:10.1007/978-3-030-50433-5_43

CrossRef Full Text | Google Scholar

32. Zhou N, Liu X, Chen Y, Du N Quantum k-nearest-neighbor image classification algorithm based on k-l transform. Int J Theor Phys (2021) 60:1209–24. doi:10.1007/s10773-021-04747-7

CrossRef Full Text | Google Scholar

33. Huang S, An W, Zhang D, Zhou N Image classification and adversarial robustness analysis based on hybrid quantum–classical convolutional neural network. Opt Commun (2023) 533:129287. doi:10.1016/j.optcom.2023.129287

CrossRef Full Text | Google Scholar

34. Wei W, Hui M, Zhang B, Scherer R, Damasevicius R Research on decision tree based on rough set. JOURNAL INTERNET TECHNOLOGY (2021) 22:1385–94. doi:10.53106/160792642021112206015

CrossRef Full Text | Google Scholar

35. Azad M, Moshkov M A bi-criteria optimization model for adjusting the decision tree parameters. KUWAIT JOURNAL SCIENCE (2022) 49. doi:10.48129/kjs.10725

CrossRef Full Text | Google Scholar

36. Schmitt I (2022). Qldt: A decision tree based on quantum logic. In NEW Trends DATABASE INFORMATION SYSTEMS ADBIS. 1652. 299–308. doi:10.1007/978-3-031-15743-1_28

CrossRef Full Text | Google Scholar

37. Azad M, Chikalov I, Hussain S, Moshkov M Entropy-based greedy algorithm for decision trees using hypotheses. ENTROPY (2021) 23:808. doi:10.3390/e23070808

PubMed Abstract | CrossRef Full Text | Google Scholar

38. Loh W-Y Classification and regression trees. Wiley Interdiscip Rev Data Mining Knowledge Discov (2011) 1:14–23. doi:10.1002/widm.8

CrossRef Full Text | Google Scholar

39. Zhou N, Zhang T, Xie X, Wu J Hybrid quantum–classical generative adversarial networks for image generation via learning discrete distribution. Signal Processing: Image Commun (2023) 110:116891. doi:10.1016/j.image.2022.116891

CrossRef Full Text | Google Scholar

40. Rebentrost P, Mohseni M, Lloyd S Quantum support vector machine for big data classification. Phys Rev Lett (2013) 113:130503. doi:10.1103/physrevlett.113.130503

CrossRef Full Text | Google Scholar

41. Srikumar M, Hill C, Hollenberg L (2022). A kernel-based quantum random forest for improved classification. arXiv:2210.02355. doi:10.48550/arXiv.2210.02355

CrossRef Full Text | Google Scholar

42. Du Y, Hsieh M-H, Liu T, Tao D A grover-search based quantum learning scheme for classification. New J Phys (2021) 23:023020. doi:10.1088/1367-2630/abdefa

CrossRef Full Text | Google Scholar

Keywords: decision tree, cost constraint, Bayesian algorithm, quantum computing, quantum kernel quantum decision tree classification

Citation: Ji N, Bao R, Mu X, Chen Z, Yang X and Wang S (2023) Cost-sensitive classification algorithm combining the Bayesian algorithm and quantum decision tree. Front. Phys. 11:1179868. doi: 10.3389/fphy.2023.1179868

Received: 05 March 2023; Accepted: 19 June 2023;
Published: 03 July 2023.

Edited by:

Nanrun Zhou, Shanghai University of Engineering Sciences, China

Reviewed by:

Meet Kumari, Chandigarh University, India
Lihua Gong, Shanghai University of Engineering Sciences, China

Copyright © 2023 Ji, Bao, Mu, Chen, Yang and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Shumei Wang, wangshumei@qut.edu.cn

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.