Unveiling machine learning strategies and considerations in intrusion detection systems: a comprehensive survey

Ali, Ali Hussein; Charfeddine, Maha; Ammar, Boudour; Hamed, Bassem Ben; Albalwy, Faisal; Alqarafi, Abdulrahman; Hussain, Amir

doi:10.3389/fcomp.2024.1387354

REVIEW article

Front. Comput. Sci., 10 June 2024

Sec. Computer Security

Volume 6 - 2024 | https://doi.org/10.3389/fcomp.2024.1387354

This article is part of the Research TopicDigital Transformation and Cybersecurity ChallengesView all 9 articles

Unveiling machine learning strategies and considerations in intrusion detection systems: a comprehensive survey

Ali Hussein Ali¹

Maha Charfeddine²

Boudour Ammar²^*

Bassem Ben Hamed³

Faisal Albalwy^4,5^*

Abdulrahman Alqarafi⁴

Amir Hussain⁶

¹REGIM-Lab: REsearch Groups in Intelligent Machines (REGIM), National School of Electronics and Telecommunications of Sfax, University of Sfax, Sfax, Tunisia
²REsearch Groups in Intelligent Machines (REGIM), National Engineering School of Sfax (ENIS), University of Sfax, Sfax, Tunisia
³Laboratory of Signals, systeMs, aRtificial Intelligence and neTworkS (SM@RTS), National School of Electronics and Telecommunications of Sfax, University of Sfax, Sfax, Tunisia
⁴Department of Computer Science, College of Computer Science and Engineering, Taibah University, Madinah, Saudi Arabia
⁵Division of Informatics, Imaging and Data Sciences, Stopford Building, University of Manchester, Manchester, United Kingdom
⁶Centre of AI and Robotics, Edinburgh Napier University, Edinburgh, United Kingdom

The advancement of communication and internet technology has brought risks to network security. Thus, Intrusion Detection Systems (IDS) was developed to combat malicious network attacks. However, IDSs still struggle with accuracy, false alarms, and detecting new intrusions. Therefore, organizations are using Machine Learning (ML) and Deep Learning (DL) algorithms in IDS for more accurate attack detection. This paper provides an overview of IDS, including its classes and methods, the detected attacks as well as the dataset, metrics, and performance indicators used. A thorough examination of recent publications on IDS-based solutions is conducted, evaluating their strengths and weaknesses, as well as a discussion of their potential implications, research challenges, and new trends. We believe that this comprehensive review paper covers the most recent advances and developments in ML and DL-based IDS, and also facilitates future research into the potential of emerging Artificial Intelligence (AI) to address the growing complexity of cybersecurity challenges.

1 Introduction

The concern for network security has developed and is now an unavoidable issue. Many security reports and research papers show an annual increase in hostile actions (Mohammadi et al., 2021; Establishment, 2023). It has been observed that many attacks attempt to exploit system vulnerabilities to harm the confidentiality, integrity, and availability of data. Typical harmful behaviors include stealing users' accounts, gaining illegal access, capturing critical information, and blocking or rejecting services (Sumaiya Thaseen et al., 2021). Access control, encryption, authentication, and a sophisticated firewall are security procedures and techniques developed to detect and mitigate these threats. An intrusion detection system (IDS) is designed to address the inadequacies of other security solutions. There is an urgent need for a sophisticated IDS that can automatically detect known and unknown threats. The fundamental role of an IDS is to monitor the exchange of data for suspicious behavior (Jatti and Sontif, 2019). Different approaches to designing IDS systems based on misuse detection, anomaly detection, or combining the two concepts have been presented in recent years. Because it looks to have additional major implications, anomaly detection is becoming more of a focus of investigation in network intrusion detection. Anomaly detection relies on statistics, expertise, and machine learning (ML) (Haji and Ameen, 2021; Prasath et al., 2022). The number of IDS deployments involving ML approaches has lately surged. The mode of learning deployed by various ML approaches allows for classifying these strategies into two major categories: supervised and unsupervised techniques (Hindy et al., 2020). The training and testing stages make up the total supervised learning process. The model is built with a labeled training set during the training process. The created model is tested for its capacity to produce accurate predictions, which results in a classification of the testing set instances. To learn from data, unsupervised learning does not require a training stage. It uses metrics to classify similar models into clusters. ML and DL aim to extract valuable information from massive data repositories. Predicting normal and aberrant behavior from learned patterns and monitoring network traffic are two of the most significant applications of ML (Alzahrani and Alenazi, 2021) and DL (Kim et al., 2021; Agrawal et al., 2022). Researchers have proposed several ML and DL-based IDS detection algorithms during the last decade. Much more study may be done on IDS to increase its ability to quickly and accurately identify network intrusions.

This paper comprehensively reviews recent advancements and trends in ML and DL-based IDS systems. It is an inventory of the most up-to-date research publications on intrusion detection, with a focus on the latest methodologies. The discussion focused on the prominent machine learning and deep learning algorithms, as well as the essential factors utilized for evaluating the outcomes. While previous survey articles have been published on machine learning-based intrusion detection systems (IDS), our research makes several novel contributions. We have carefully selected the recent research papers in intrusion detection to highlight the most advanced methods. Our analysis goes beyond simply listing papers; we investigated modern and widely used datasets, commonly used metrics and indicators, and studied and categorized major IDS-detected attacks. Furthermore, we covered important recent ML and DL-based IDS algorithms, as well as key parameters for evaluating their results. Moreover, we thoroughly addressed the challenges and potential advancements in ML and DL-based IDS systems. In addition, to evaluate our research, we compared it to other studies, identifying both similarities and differences between our methodology and previous surveys. This level of analysis provides an in-depth overview of the current landscape in ML and DL-based IDS research. By compiling the obtained research findings, our survey offers insight and direction for future studies in the field, ultimately contributing to the advancement of intrusion detection technology.

The following is the outline for the rest of the paper. The study's methodology is explained in Section 2. Section 3 introduces the fundamental IDS concepts and the various categorization algorithms. Section 4 reviews the DL and ML techniques in greater depth. Section five details the evaluation metrics and the performance indicators. Section 6 outlines the public datasets used as benchmarks. Section 7 discusses the most significant findings in ML and DL-based IDS, addresses the research challenges related to this subject and highlights novel trends and future directions. Section 8 compares our proposed study to other surveys on ML and DL-based IDS. The ninth section concludes this review article.

2 Methodology

This study covers the most important ML and DL-based IDS studies published recently in peer-reviewed papers since 2020. We find relevant articles, assess them, and gather important information. This analysis, for the most part, attempts to answer the following questions:

• How have AI-based intrusion detection systems evolved recently?

• What are the most popular and modern ML and DL approaches employed for IDS?

• What are the strengths, weaknesses and implications of ML and DL-based IDS systems?

• What datasets are commonly and recently utilized in AI-based IDS testing?

• Which metrics and performance indicators are most frequently used to evaluate performance?

• What are the research challenges, emerging trends, and expected future developments in ML and DL-based IDS systems?

• Are there any previous state-of-the-art surveys that have tackled this important research topic?

• What are the differences and similarities between our systematic approach and existing surveys, as well as the specific types of concerns we addressed in our survey?

The paper provides a comprehensive survey of the effectiveness of modern algorithms in intrusion detection, describing the most recent solutions, datasets, metrics, indicators, and approaches used. It is a useful resource for researchers working in these domains, addressing the challenge and emerging trends in ML and DL-based IDS systems, and thereby advancing intrusion detection technology.

3 IDS: concept and classification

This section provides a first exposition of the fundamental principles underlying IDSs, followed by information on how IDS are classified based on deployment and threat identification. Table 1 lists the abbreviations used in this article.

Table 1

Table 1. Meaning of acronyms.

3.1 IDS concept

Dorothy E. Denning invented intrusion detection systems (IDS) in 1987 to detect network and computer attacks. IDS is a collection of approaches designed to detect suspicious, malicious, or unusual behavior that threatens the security of networks and computers (Oprea et al., 2021). Computer or network systems intruders can jeopardize data security by modifying, destroying, or making information unavailable. Conversely, a detecting system is a preventive mechanism for identifying this illegal activity. IDS is software or hardware used to monitor computerized systems to detect intruders. Today, many commercial and open-source IDSs have varying capabilities depending on their components, such as the type of attack they can detect, their categories or classes, and their strategy (Wester, 2021).

Figure 1 depicts an IDS classification based on the detection approach used and its environments.

Figure 1

Figure 1. Classification of IDSs (Hindy et al., 2020).

3.2 Categorization of IDS based on environment

Depending on what is being tracked, the IDS can be classified principally into two types: those operating on hosts and across networks. Other types of IDS include graph, application, distribution, and hypervisor-based IDS (Borkar et al., 2017).

3.2.1 Host-based IDS

HIDS is deployed on individual hosts or endpoints, such as servers, workstations, or network devices (Gu and Lu, 2021). HIDS monitors activities and events occurring on the host where it is deployed and requires installation of agents or sensors on each host to gather data. It is placed directly on the host or endpoint it is intended to protect. HIDS offers detailed insight into activities and events specific to the host where it is deployed. It is ideal for protecting critical servers or endpoints where detailed monitoring and analysis are required. Nonetheless, HIDS has some drawbacks. It uses many computer system resources, can interfere with operating systems and firewalls, and is difficult to maintain in large quantities networks (Panagiotou et al., 2021).

3.2.2 Network-based IDS

NIDS is deployed on strategic points within the network infrastructure. It is positioned at network perimeter, internal segments, or critical chokepoints. It does not require agents on individual hosts, making deployment simpler and less intrusive. It analyzes and monitors the network traffic passing through designated points to detect suspicious or malicious activity (Borkar et al., 2017). Because it provides an up-to-date view of the entire network, deploying NIDS on the network influences its effectiveness (Sultana et al., 2019). NIDS combines two detection approaches for intrusions: misuse and anomaly. NIDS has some downsides: it cannot analyze encrypted packets, is susceptible to DoS attacks, and has limited visibility into the host machine (Hindy et al., 2020).

In the following, we will explain the different detection method-based IDS.

3.3 Detection method-based IDS

IDS can be classified into three categories based on their ability to detect misuse, abnormal, or hybrid behavior (Riyaz and Ganapathy, 2020). Detection methods are described below, along with their benefits and weaknesses.

1. Misuse detection

The signature-based or misuse detection approach is based on known signatures stored in the system as patterns or rules. Each received packet is compared to the signatures that are provided (Lansky et al., 2021). When there is a match, the plan sends an alert. Misuse detection effectively detects frequent cyberattacks but fails to detect new ones. Furthermore, if an error is made in the definition of signatures, the False Alarm Rate is increased. State-based strategies, rule-based methodologies, pattern matching, and data analysis methods implement the misuse detection techniques (Hindy et al., 2020).

2. Anomaly detection

The anomaly detection approach is based on creating a profile to distinguish between normal and attack behavior. Each incoming packet is examined using several extracted or generated features to determine whether it is normal or malicious (Kunhare and Tiwari, 2018). When an attack activity is detected, an alarm is issued. In contrast to the misuse detection approach, the anomaly detection method can effectively detect a new attack, but at the expense of a high FAR. In the literature, many methods, such as rule-based models, biological models, models based on signal processing techniques, statistical models, and learning models, are used to implement an anomaly detection strategy (Hindy et al., 2020).

3. Hybrid detection

The hybrid IDS, which combines anomaly and misuse detection approaches, is more effective than either. As previously stated, the anomaly and abuse techniques have advantages and disadvantages (Einy et al., 2021). The disadvantages of the two strategies can be mitigated by combining them. IDS's capacity to detect most network threats has improved (Maseno et al., 2022).

3.4 Significant cyberattacks detected by intrusion detection systems

Intrusion Detection Systems play an essential role in protecting networks and systems from a wide range of cyberthreats. As a result, it is important to understand how IDS may efficiently detect different types of cyberattacks. Organizations may develop an exhaustive defense plan tailored to their specific security requirements by categorizing attacks into major classes and assigning them to the appropriate IDS type. Table 2 summarizes several classes of cyberattacks, providing a thorough overview of each category as well as particular examples. It also specifies which type of Intrusion Detection System, Host-based (HIDS) or Network-based (NIDS), is best suited to detecting each example inside the appropriate class of attack.

Table 2

Table 2. Major cyberattacks detected by IDS.

Choosing between HIDS and NIDS depends on the type of attack and its position inside the network. NIDS are ideal for monitoring network-wide traffic patterns for detecting attacks on multiple hosts or network services. On the other hand, HIDS are effective at monitoring particular hosts or systems for signs of unauthorized access or malicious activity.

As illustrated in Table 2, we emphasize the broad nature of cyberattacks and the importance of deploying both NIDS and HIDS to effectively detect and mitigate a wide range of threats to network infrastructure and individual systems.

We propose in Figure 2, an alternative classification for the major attacks detected by IDSs.

• Disruptive attacks. These attacks aim to disrupt or impair the normal functioning of network services or systems. Examples include DoS attacks, DDoS attacks, ICMP floods, Heartbleed, etc.

• Exploratory attacks. This class of attack intended to probe and gather information about network infrastructure and vulnerabilities. Examples include port scanning, OS fingerprinting, network reconnaissance, etc.

• Privilege escalation attacks. This category attempts to elevate user privileges or gain unauthorized access to privileged accounts. Examples include User-to-Root (U2R) attacks, exploiting Sudo vulnerabilities, buffer overflows, Trojan Horse, Spyware, Ransomware, MITM, etc.

• Unauthorized access attacks. This class involves unauthorized access attempts or the exploitation of system vulnerabilities to gain entry. Examples include remote-to-local (R2L) attacks, brute-force attacks, exploiting vulnerable services, Web attacks, infiltration, Botnet, spoofing, Mirai, etc.

Figure 2

Figure 2. Significant cyberattacks detected by IDS.

This alternative classification depicted in Figure 2 underscores the importance of deploying appropriate IDSs to detect and mitigate threats in all categories, strengthening defenses against evolving cyber threats.

4 Artificial intelligence methods for IDS

The utilization of ML and DL techniques often entails the execution of three primary steps, as seen in Figure 3: (i) Data preprocessing step, (ii) Training step, and (iii) Testing step. Before utilizing the technique, the dataset undergoes preprocessing to convert it into a usable format. During this phase, the process usually includes encoding and normalization. During the step 1, it is necessary to clean the dataset by deleting entries that have missing data and duplicate records. So, the first step includes transformations to numeric, data visualization and analysis, scalling and normalization. The preprocessed data is subsequently partitioned into two random subsets: the training and testing datasets. Usually, the training dataset consists of approximately 80% of the original dataset, while the remaining 20% is used for testing purposes. The training dataset is used to train the ML or DL algorithm : step 2. After the model has been trained, it is sent to testing using a separate dataset and assessed by its predictions. For IDS models, the network traffic instance will be classified as benign or an attack : step 3.

Figure 3

Figure 3. Generalized ML/DL-based intrusion detection system methodology.

This section overviews the ML and DL methodologies frequently employed in developing an efficient IDS.

4.1 Machine learning algorithms

ML is an AI discipline that allows machines to learn from enormous datasets by automatically building mathematical models (Xin et al., 2018). This subsection describes the most often used ML approaches for IDS.

1. K-Nearest Neighbors (KNN)

KNN is a nonparametric classification approach known as instance-based learning. A lazy learner favors classification over training (Singhal et al., 2021). The KNN algorithm begins by computing distances among points in an n-dimensional space. Second, it finds the k locations closest to the unlabeled moment (Belgrana et al., 2021). Finally, by majority vote, it assigns the unlabeled point to the class of its KNN. The k value impacts classification accuracy (Kunhare and Tiwari, 2018).

2. Support vector machine (SVM)

SVM is a binary data-supervised classifier. It can, however, be applied to unsupervised machine learning (Binbusayyis and Vaiyapuri, 2021). The main aim of SVM is to determine the optimal hyperplane that effectively separates a collection of training vectors within a high-dimensional space into two distinct classes (Mohammadi et al., 2021). An SVM raises the dimensionality of the input vector to make its components separate to reach the high dimensional space. Maximizing the distance between it and the support samples is essential to find the best hyperplane rather than the complete set of outlier-resistant training vectors (Alsarhan et al., 2021). An SVM is applied in intrusion detection, producing good results regarding FAR compared to other approaches (Wisanwanichthan and Thammawichai, 2021). This article (Zou et al., 2023) proposes a network intrusion detection approach called HC-DTTWSVM, based on DT twin SVM and hierarchical clustering. HC-DTTWSVM is designed to effectively detect various forms of network intrusion. The hierarchical clustering algorithm is initially utilized to create the DT for network traffic data. The bottom-up merging strategy is employed to optimize the separation of the higher nodes in the DT, minimizing error accumulation throughout the construction process. Subsequently, twin SVMs are integrated into the created DT to execute the network intrusion detection model. This model is capable of accurately identifying the network intrusion type in a hierarchical approach. The performance of the HC-DTTWSVM approach is assessed using the NSL-KDD and UNSW-NB15 intrusion detection benchmark datasets. The experimental findings demonstrate that HC-DTTWSVM is capable of efficiently detecting various types of network intrusion and achieve similar detection performance to recently suggested approaches for network intrusion detection.

3. Artificial neural networks (ANN)

ANN is a parallel processing model inspired by the brain's neural networks. An ANN's processing unit comprises multiple nodes or neurons connected by a network of synapses, each with a weight and a learning process (supervised or unsupervised). An ANN comprises many unique layers (Sumaiya Thaseen et al., 2021; Hassija et al., 2024). The input layer receives data from the outside world. The hidden layer consists of nodes whose input and output signals remain within the network, and the output layer processes the data and sends it to the outside world (Kavitha and Manikandan, 2022; Javed et al., 2023). Different ANNs, such as Multilayer Perceptron (MLP) and Self-Organizing Map (SOM) can be used in intrusion detection, depending on how many hidden layers they have and the network design (Choraś and Pawlicki, 2021).

This research (Das et al., 2021) presents a comprehensive security solution for network intrusion detection utilizing a machine learning approach. The authors utilize an ensemble-supervised machine learning framework and the ensemble feature selection algorithms: NN, LR, DT, NB, and SVM Ali Hussein Ali (2024b). In addition, they offer a comparative examination of multiple machine-learning models and strategies for selecting features. The objective of this research is to develop a universal detection mechanism that attains superior precision while minimizing the occurrence of false positive rates (FPR). The experiment utilized the NSL-KDD, UNSW-NB15, and CICIDS2017 datasets. The results indicate that the detection model can accurately identify 99.3% of intrusions while maintaining a low false alarm rate of 0.5%. This demonstrates superior performance metrics in comparison to current solutions.

4. K-Means clustering

K-means clustering has become one of the most commonly used unsupervised learning methods due to its ease of use and rapid convergence. It is a method for categorizing a dataset into k separate and non-overlapping clusters (Liu et al., 2021). Before beginning the algorithm, the cluster number k must be determined. The K-Means algorithm starts by randomly selecting the kth object and assigning it to a cluster mean. The remaining objects are then transferred to the kth similar collection based on their distance from the cluster mean. This procedure is performed until the cluster assignments no longer change. The K-means algorithm results generally depend on the initial cluster assignment of the algorithm's first phase (Maseer et al., 2021). As a result, the method must be run numerous times to find the solution with the smallest objective. The author (Chandra et al., 2019) proposes a hybrid model that uses Filter-based Attribute Selection to reduce the dimensionality of the dataset's features. The KDDCUP99 dataset was used for training and testing. This model is evaluated using a variety of performance criteria. The proposed model significantly improves detection accuracy.

The authors of this study (Alenezi and Aljuhani, 2023) suggest a smart intrusion detection strategy that employs principal components analysis (PCA) as a method for feature engineering. This technique aims to identify the most important characteristics, decrease data complexity, and enhance the accuracy of intrusion detection. During the classification phase, the authors utilize clustering methods like K-means to ascertain whether a specific flow of IIoT communication is normal or under attack for binary classification. To assess the suggested model's efficiency and resilience, it was tested using a novel dataset known as X-IIoTID. The detection method attained a superior accuracy rate of 99.79% and a decreased error rate of 0.21% in the performance results, outperforming current techniques.

5. Tree-based machine learning techniques

Decision Trees (DT) is a classifier that uses a set of known cases to predict the class of an unknown example by applying a series of decisions that can be easily translated into classification rules (Ogundokun et al., 2021). DT is classified into two types based on the task to solve: regression and classification problems. A regression tree is utilized for quantitative classification (numerical class labels), whereas a classification tree is used for qualitative classification. A DT is a flowchart with a node hierarchy (Guezzaz et al., 2021). Each branch, beginning with a root node, indicates the result of a test performed on a non-leaf node that represents an attribute, while a leaf node represents a class label. The attribute value of the unknown classified instance is checked using a DT to trace the path from the root node to the leaf node, reflecting the class prediction for that instance (Al-Omari et al., 2021). A DT is built by increasing the information obtained at each attribute split, leading to a natural feature ranking or selection. In general, DTs offer higher accuracy and simpler implementation than more sophisticated algorithms such as SVMs, and it does not require parameter setup or domain knowledge. The ease of extracting rules from DTs is proportional to the size of the tree (Bhosale et al., 2020). DT-based intrusion detection methods are now in use. Iterative Dichotomiser 3 (ID3), RF and Classification and Regression Trees (CART) are the three most well-known algorithms for implementing DTs. Each ID3, C4.5, and CART uses a greedy, top-down approach in constructing the tree (Riyaz and Ganapathy, 2020). Another known advanced tree-based ML technique, eXtreme Gradient-Boosting XGBoost is selected to enhance attack detection (Alzahrani and Alenazi, 2021). The suggested approach is trained and tested on the NSL-KDD dataset. Compared with basic tree-based ML systems, the dataset is subjected to several sophisticated preprocessing approaches to extract the best form of the data, yielding exceptional results. A multiclass classification challenge identifies attacks and classifies their types with 95.95% accuracy, utilizing only five of NSL-KDD's 41 features. This research enhances NIDS's accuracy and monitoring.

6. Naive Bayes (NB)

The NB Network is a classifier based on the Bayes theorem. It represents probabilistic correlations between relevant variables to simulate an uncertain domain (Kurniawan et al., 2021). NB is represented by a directed cyclic graph, with each node representing a variable, its conditional probability table, and each link encoding how one node affects the others. Because it is a classifier, NB can be used for intrusion detection. Although the usefulness of NB has only been proved in one situation, its outcomes are equivalent to threshold-based systems while needing less computer work (Wester, 2021). The researchers of this paper (Gu and Lu, 2021) propose an IDS based on SVM and NB feature embedding. Fisr, the NB feature transformation is implemented on the original features to generate novel data with high quality; then, an SVM classifier is trained using the new data to create the intrusion detection model. Experiments on multiple intrusion detection datasets reveal the proposed detection method's good and robust performance, with accuracy rates of 93.75% on the UN-SWNB15 dataset and 98.92% on the CICIDS2017, 99.35% on the NSL-KDD dataset, and 98.58% on the Kyoto 2006+ dataset.

4.2 Deep learning algorithms

This section illustrates the DL techniques used by the reviewed studies to deliver DL-based IDS solutions. DL is a subclass of ML that uses deep neural network features provided by several hidden layers. These approaches are characterized by their complex architecture and intrinsic ability to understand the main aspects of a dataset and give an output with minimum human assistance.

1. Recurrent neural networks (RNN)

RNNs are feed-forward neural networks that can be used to represent data consecutively. RNNs are input, concealment, output units, and the model's “memory components.” Each RNN unit decides depending on the input and output of earlier inputs (Al-Emadi et al., 2020). RNNs have many more applications than those described above. Within an IDS, supervised classification and feature extraction can be performed using an RNN. If the sequences are excessively long, RNNs have short-term memory problems. Several RNN variants, including the LSTM and GRU types of RNNs, have been created to overcome these concerns (Tang et al., 2019; Mittal et al., 2021). Authors Naseer et al. (2018) conducted a comparative analysis of IDS on a GPU-based testbed using multiple DL and ML approaches. Experiments utilizing LSTM and Deep CNN outperformed those using other models on the NSL-KDD benchmarking dataset. An RNN-based IDS with GRU as the primary memory, a multilayer perceptron, and a softmax classifier has been published in Xu et al. (2018). Tests were done on the KDD Cup'99 and NSL-KDD datasets. The experimental results indicated that the detection performance outperformed other approaches. There is a serious issue with the system's inability to recognize less prevalent forms of attack, such as U2R and R2L. Authors Kasongo (2023) conducted a comparison study on IDS on the NSL-KDD and UNSW-NB15 bench-marking datasets, with XGboost-LSTM achieving higher accuracy than alternative models. The authors in this study (Bakhsh et al., 2023) propose a DL-based IDS, employing Feed Forward Neural Networks (FFNN), Long Short-Term Memory (LSTM), and Random Neural Networks (RandNN) as defense mechanisms against cyberattacks in IoT networks. The suggested technique performs better than the present state-of-the-art DL-IDS utilizing the CIC-IoT22 dataset. The FFNN model achieves an accuracy of 99.93%, the LSTM model achieves an accuracy of 99.85%, and the RandNN model achieves an accuracy of 96.42% in detecting incursion.

2. AutoEncoder (AE)

AE is a popular DL method that uses unsupervised neural networks. The best features are learned so that the output closely resembles the input. It includes similar information and output layers. However, the size of the hidden levels is frequently less than those of the input layer (Khan and Kim, 2020; Saheed et al., 2023). AE is symmetric and operates with an encoder-decoder arrangement. AE variants include Stacking AE, Sparse AE, and Variational AE (Rahman et al., 2021; Hameed et al., 2024). Authors Al-Qatf et al. (2018) suggested using a comparable concept of self-learning based on sparse AE and SVM. They validated their performance by running tests with the suggested model with the NSL-KDD dataset. The overall performance improved when the results were compared to different DL and ML methods. In Binbusayyis and Vaiyapuri (2021), the autoencoder (1D CAE) and a one-class support vector machine (OCSVM) are suggested. To test the model, the authors use the NSL-KDD and UNSWNB15 datasets.

Authors Yang et al. (2020) proposed a DNN-Supervised Adversarial Variational (DNN-SAVER) system based on an AE with regularization. It was tested using the datasets UNSW-NB15 and NSL-KDD. According to experimental results, the model effectively recognizes occasional and previously unknown risks. A multistage model with a 1D convolution layer and two fully stacking linked layers was reported (Andresini et al., 2020). To recreate the data, two AEs were trained independently utilizing benign and attack flows. New models from the recovered dataset are supplied into the network as input for creating the 1D-CNN. Finally, a softmax classifier classifies the dataset using the convolution layer results. The proposed technique outperforms other DL models on the KDD Cup'99, CICIDS2017, and UNSW-NB15 datasets.

This work (Catillo and Villano, 2023) introduces CPS-GUARD, an innovative intrusion detection method that utilizes a single semi-supervised autoencoder and a strategy for determining the threshold that separates regular activities from attacks. The method is designed to be sensitive to outliers, using outlier identification to address intrinsic flaws in the training data. CPS-GUARD undergoes evaluation by direct experimentation, utilizing both regular and intrusive data points from individual sensing devices, an HTTP server, and four comprehensive systems, which include Cyber-Physical Systems. The tests encompass a diverse array of attacks present in six cutting-edge datasets. The intrusion detection findings of CPS-GUARD exhibit recall values ranging from 0.949 to 1.000, precision values ranging from 0.961 to 0.999, and false positive rates ranging from 0.006 to 0.027, depending on the particular system under evaluation. The examination also encompasses a comparative analysis of alternative methodologies for selecting thresholds and identifying outliers.

The researchers Hnamte et al. (2023) propose a novel method that combines AE and LSTM algorithms and trained and tested the model using two datasets: CICIDS2017 and CSE-CICIDS2018. The AE encrypts the original data, creating a bottleneck, while the decoding network restores all of the data. The proposed model's key problems are the merging of two types of architectures and training under smoothing limitations. When trained for up to 30 epochs, the suggested hybrid model demonstrated an impressive multiclass detection accuracy of 99.99% on the CICIDS2017 dataset, surpassing the 99.10% achieved on the CSE-CICIDS2018 dataset. The experimental results surpassed the accuracy performance measures of other state-of-the-art intrusion detection methods.

3. Deep neural network (DNN)

A DNN is a fundamental DL structure that allows multilayer models to be trained. Authors in RM et al. (2020) described the system as having an input layer, an output layer, and several other components. The model's abstraction level increases as the number of hidden layers increases, boosting its effectiveness. In Jia et al. (2019), the KDD cup'99 and NSL-KDD datasets were subjected to categorization using a DNN-based IDS network, including four hidden layers. The activation function employed by authors for the buried layer was the Rectified Linear Unit (ReLU). In Kavitha and Manikandan (2022), the authors apply the bottleneck layer method to the CICIDS-2017 dataset to show how well it can identify cyberattack features. According to the findings, the bottleneck model architecture, which combines ANN and DNN models, is superior to conventional ANN, DNN, and SVM variants. Multiple datasets, such as KDDCup 99, NSL-KDD, Kyoto, UNSW-NB15, and CICIDS 2017, were used to measure the performance of the proposed IDS model. The experimental findings showed that the suggested model performed better than other ML methods.

4. Deep belief network (DBN)

DBN is a deep learning model that utilizes Restricted Boltzmann Machines (RBMs) followed by a softmax classification layer. In an RBM, two layers of data flow in both directions (Tan et al., 2019). All nodes in the preceding and subsequent layers of the layer are linked, while nodes in the current layer are not. Unsupervised layer-wise learning is used to pre-train DBN before using supervised fine-tuning to discover useful features. The IDS system uses DBN to extract and classify characteristics (Süzen, 2021).

The paper of He et al. (2023) examines the characteristics of adversarial challenges in Network Intrusion Detection Systems (NIDS). Authors focused on the offensive approach, which involves developing methods to create adversarial examples to bypass various machine-learning models. They specifically investigated the utilization of evolutionary computation techniques such as Particle Swarm Optimization (PSO), Genetic Algorithms (GA), DBN, and deep learning methods like Generative Adversarial Networks (GANs) to generate these examples. To evaluate their ability, the researchers utilized these algorithms on two datasets that are accessible to the public: NSL-KDD and UNSW-NB15. The findings indicated that their methodologies elevated misclassification rates across eleven ML models, including a voting classifier.

5. Convolutional neural network (CNN)

Regarding data structures, CNN is better suited for data stored in arrays. A sequence of convolutional and pooling layers, followed by a fully connected layer and a softmax classifier, comprise the framework for feature extraction (Riyaz and Ganapathy, 2020). Regarding computer vision, CNN has a long history of success (Fki et al., 2023). IDS uses them for feature extraction and classification, so they are supervised (Khan et al., 2019; Azizjon et al., 2020). Proposals for an IDS model using CNN and gcForest were made, and a new P-Zigzag approach for generating two-dimensional grayscale images from raw data was also available (Zhang et al., 2019). An advanced CNN model (GoogLeNetNP) was applied in a coarse grit layer. The anomalous classes are further sub-classed using gcForest in a fine-grained layer. It was chosen to combine the UNSW-NB15 and CIC-IDS2017 datasets to create a new dataset. According to the trials, their methodology significantly decreases FAR when compared to single methods. For an effective IDS system, Jiang et al. (2020) proposed a deep hierarchical CNN-BiLSTM system. This approach uses both CNN and BiLSTM to handle the class imbalance issue; the SMOTE is employed to increase minority samples, aiding the model learning process (Ali Hussein Ali, 2024a). To extract geographical and temporal properties, CNN and belts were used. The datasets NSL-KDD and UNSW-NB15 were used in the studies. As a result, the proposed approach has higher accuracy. Because of the structure's complexity, the training duration is longer. The few-shot learning (FSL) model was described as an IDS model (Yu and Bian, 2020). Small amounts of uniformly dispersed labeled data from the dataset are employed for training. Two datasets are used, the NSL-KDD and UNSW-NB15, to illustrate the model's usefulness. The article (Wang et al., 2023) employs the CSE-CIC-IDS2018 dataset and evaluates its performance using standard evaluation metrics. Six models, namely DNN, CNN, RNN, LSTM, CNN + RNN, and CNN + LSTM, were developed to ascertain the presence of a malicious attack in network traffic. The proposed model greatly enhances the performance of detection. In addition, the processing time for combinations of CNN with RNN and CNN with LSTM is greater than that of individual DNNs, RNNs, and CNNs. Thus, when implemented in an IDS device, it can be inferred that DNNs, RNNS, and CNNs are superior to utilizing combinations like CNN+RNN and CNN+LSTM.

This study (Saba et al., 2022) introduces a CNN technique for anomaly-based intrusion detection systems (IDS) that leverages the capabilities of the IoT to effectively analyze all network traffic in the IoT environment. The suggested approach can identify all potential intrusions and atypical traffic patterns. The model was trained and evaluated using the NID Dataset and BoT-IoT datasets, attaining accuracy rates of 99.51% and 92.85%, respectively.

This research (Madwanna et al., 2023) presents two deep learning-based intrusion detection systems (IDSs). The first IDS is a fusion of LuNet and Bidirectional Long Short-Term Memory (Bi-LSTM), while the second IDS combines Temporal Convolutional Network (TCN), Convolutional Neural Network (CNN), and Bi-LSTM. In order to maintain the IDS (Intrusion Detection System) up-to-date and precise, it is necessary to provide it with a sufficient quantity of samples. The first model has undergone training and evaluation using two established benchmark datasets, namely NSL-KDD and UNSW-NB15. The second model has undergone training and testing using the NSL-KDD dataset. In order to address the issue of limited sample size, the models have employed a method known as Synthetic Minority Oversampling Technique (SMOTE). These models yielded superior experimental results compared to conventional machine learning-based methods and numerous deep learning approaches. Their classification accuracy and detection rate are superior. The first model achieved a classification accuracy of 82.19% for UNSW-NB15 and 98.87% for NSL-KDD. The second model achieved a classification accuracy of 98.8% for NSL-KDD.

Referring to Figure 4, observations indicate that 45% of the suggested methods exclusively utilize machine learning (ML) approaches and 40% of the solutions apply DL methods. In contrast, only 13% of the recommended solutions are based on a hybrid strategy that mixes ML and DL algorithms.

Figure 4

Figure 4. Methodology distribution.

Table 3 displays a compilation of recent ML and DL algorithms that researchers have developed to detect network attackers.

Table 3

Table 3. An overview of recent research on network intrusions with ML/DL techniques.

Various metrics can be used to evaluate ML and DL algorithms for IDS.

5 Evaluation metrics and performance indicators

This section discusses commonly used metrics and performance indicators. The Confusion Matrix (CM) is a two-dimensional matrix that defines the actual and expected categories (Deng et al., 2016; Zhu and Liu, 2024).

• True Positive (TP): The classifier successfully recognizes data objects as Attacks.

• False Negative (FN): Incorrectly identified as Normal.

• False Positives (FP): Instances in the data that were wrongly identified as Attacks.

• True Negative (TN): The instances are correctly classified as Normal.

The following are the various metrics used in the most recent evaluation research:

• Accuracy: the ratio of accurately identified cases to total cases. Detection accuracy only matters if the dataset is uniformly distributed.

\begin{array}{l} A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N} & (1) \end{array}

• Precision: refers to the percentage of accurately predicted attacks relative to the total number of samples labeled as attacks.

\begin{array}{l} P r e c i s i o n = \frac{T P}{T P + F P} & (2) \end{array}

• A recall (Sensitivity): is the fraction of samples correctly labeled as attacks relative to the total number of attack samples.

\begin{array}{l} R e c a l l = T P R = S e n s i t i v i t y = \frac{T P}{T P + F N} & (3) \end{array}

• False alarm rate (Specificity): determines the proportion of wrongly predicted positive labels among all actual negative labeled attacks.

\begin{array}{l} F A R = F P R = S p e c i f i c i t y = \frac{F P}{F P + T N} & (4) \end{array}

• True negative rate (TNR): is the fraction of samples correctly labeled as normal as a percentage of all normal models.

\begin{array}{l} T N R = \frac{T N}{T N + F P} & (5) \end{array}

• F-Measure: is the mathematical middle ground between Precision and Recall. It is a statistical method for assessing a system's reliability by looking at its performance in terms of precision and recall.

\begin{array}{l} F - M e a s u r e = 2 * (\frac{P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}) & (6) \end{array}

• False discovery rate: this metric measures the proportion of incorrectly predicted positive labels among all positive predictions. It is also known as false alarm rate or type I error rate, calculated as:

\begin{array}{l} F D R = \frac{F P}{F P + T P} & (7) \end{array}

• Matthews correlation coefficient: this metric measures the correlation between the predicted and actual values of a classification model. It takes into account the true positives, false positives, true negatives, and false negatives generated by the model.

\begin{array}{l} M C C = \frac{(T P * T N) - (F P * F N)}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N}} & (8) \end{array}

Besides, statistics-based Intrusion Detection System constructs a model that can represent the distribution of normal behavior profiles. It, then, identifies low-probability events and flags them as potential intrusions. Statistical AIDS essentially considers the statistical metrics like the median, mean, mode, and standard deviation of packets. Instead of inspecting entire data traffic, each packet is individually monitored, serving as a fingerprint of the flow. Statistical IDSs effectively identify deviations in present behavior from the established normal behavior (Li et al., 2019).

In addition to the common used metrics, some performance indicators can be useful for further comparative analysis. In fact, ANNs require careful selection and fitting of activation functions to effectively model complex relationships in data. However, they can be prone to convergence issues, where the training process struggles to reach an optimal solution (Dini et al., 2023). Convergence in neural networks refers to the stage during training where further adjustments to the model's parameters result in diminishing improvements in performance. At this point, the changes in the learning rate become minimal, and the errors produced by the model on the training data approach a minimum. Another way to identify convergence in a deep learning model is when the loss, which quantifies the disparity between predicted and actual values, reaches its lowest achievable value. These variations aim to improve the convergence speed which can be an added metric and overcome issues related to large datasets.

The complexity of the ANN algorithm is related to the time computational time. It's noteworthy that binary classification generally outperformed multi-class classification, achieving precise results with low false negatives and false positives. Conversely, multi-class classification is more computationally intensive and intricate, resulting in less effective outcomes, as evidenced in this study. For example, generally, DT demonstrated a quick computation time, likely due to their lower complexity compared to other methods. However, all the methods exhibited relatively high computation times, highlighting the necessity for efficient algorithms in intrusion detection systems (Dini et al., 2023).

6 Benchmark datasets

This section outlines the datasets used by the researchers to test their methodologies. Table 4 includes a detailed overview of the dataset and the corresponding attacks.

• KDD Cup'99: It receives much attention and is a popular IDS dataset. There are around 5 million training recordings and 2 million test records available. Based on 41 criteria, each entry is classified as normal or an attack (Al Tobi and Duncan, 2018).

• NSL-KDD: Kyoto University gathered network traffic records for this dataset using honeypots, darknet sensors, email servers, web crawlers, and other network security protocols. Each record has 24 statistical attributes, 14 of which are extracted from the KDD Cup'99 dataset and ten new ones (Salo et al., 2019).

• Kyoto 2006+: Kyoto University used honeypots, darknet sensors, email servers, web crawlers, and other network security protocols to gather network traffic records for this dataset. Each record has 24 statistical attributes, 14 drawn from the KDD Cup'99 dataset, and 10 are additional features (Gu and Lu, 2021).

• UNSW-NB15: The Center for Cyber Security in Australia developed this dataset. Bro-IDS, Argus, and other novel approaches were used to recover around two million records with 49 features (Michelena et al., 2024).

• CIC-IDS2017: This dataset was created in 2017 by the Canadian Institute of Cyber Security (CIC) and is now publicly available. This version will find recent iterations of real-world attacks and the typical flow patterns (Kumar and Pathak, 2022).

• CSE-CIC-IDS2018: The Communications Security Establishment (CSE) and the CIC partnered to create this dataset in 2018. Abstract representations of the numerous occurrences in user profiles are made. These profiles are combined into a single dataset by employing a distinct collection of features (Karatas et al., 2020).

• CIC IoT dataset 2023: This dataset contains a real-time benchmark for large-scale attacks in IoT environments. These attacks are classified into seven categories: DDoS, DoS, Recon, Web-based, Brute Force, Spoofing, and Mirai. These attacks are executed by malicious IoT devices targeting other IoT devices. The dataset includes 46 features (Jony and Arnob, 2024).

• CIC-MalMem-2022 : The data was designed to closely replicate a real-life scenario by employing malware that is commonly utilized in the real world. The dataset consists of Spyware, Ransomware, and Trojan Horse malware. It can be used to assess obfuscated malware monitoring systems. The dataset is evenly distributed, with 50% consisting of malignant memory dumps and the other 50% consisting of benign memory dumps (Talukder et al., 2023).

• IOTINTRUSION-2020 : The IoT network intrusion dataset was developed in 2020. This dataset contains eight IoT cyberattacks, including flooding, brute force, spoofing, and scanning, as well as 79 network traffic features that describe benign and malicious network traffic. Network features were extracted using a CIC flowmeter (Ullah and Mahmoud, 2020).

Table 4

Table 4. Overview of the datasets used in ML and DL-based IDS.

Benchmark datasets play a crucial role in evaluating the effectiveness of the suggested methodology. Figure 5 examines the utilization of the public datasets. It is demonstrated that 41% of the time, NSL-KDD and KDD Cup'99 were utilized for the objectives of testing and validation. Both datasets are considered old-fashioned, although they remain highly favored among academics due to the abundance of comprehensive findings documented in the literature. It is clear that IoT Intrusion 2020, CIC-MalMem-2022 and CIC-IoT-2023 datasets represent only 11% of utilization in research papers because they are new comparing to others.

Figure 5

Figure 5. Datasets distribution.

7 Discussion, challenges and future trends

Following the previous sections, we discuss the most significant findings in ML and DL-based IDS. Next, we address the research challenges related to this subject. Then, we highlight novel trends and future directions to effectively detecting intrusions. Finally, we outline the practical and managerial implications and impacts of using ML, DL, and advanced related technologies within IDS.

7.1 Discussion of the findings

The AI-powered IDS's success relies mainly on training with a sufficient dataset. Training an ML model can give satisfactory results even with a small dataset. However, ML is only suitable for large datasets if the data is naturally tagged. Because labeling is time-consuming and expensive, DL techniques are recommended for large datasets. These algorithms can learn and discover interesting patterns from raw datasets.

Computational power and time to process increase the time and resource requirements of the learning process. Training the IDS model improves its ability to identify attacks. Table 5 shows the strengths and weaknesses of approaches used in recent articles regarding complexity, amount of execution time, available dataset, and evaluation metrics used. We also found that the model's performance in some proposed solutions datasets could be stronger in more current datasets. Another key problem that most approaches share is their failure to detect attacks when given a sparse training dataset successfully. Because of the class imbalance issue, the accuracy for these underrepresented attacks should be given additional thought.

Table 5

Table 5. Strengths and weaknesses of the ML and DL-based IDS approaches.

Conversely, we discovered that certain approaches are more complex, demanding longer model training durations. DL techniques show a tradeoff between model complexity and a more detailed organizational structure. The more complex the model, the longer it takes to execute and the more resources it requires. As a result, by carefully picking important features for the model's training, this disadvantage can be overcome.

Furthermore, Figure 6 and Table 3 illustrate how researchers utilize ML- or DL-based algorithms while developing an effective IDS solution. The three most commonly utilized algorithms are SVM, RNN, and CNN. Next, approaches such as DT, K-Means and ANN are included in the list and are mostly employed in hybrid designs to support and enhance DL algorithms.

Figure 6

Figure 6. Total frequency of usage of ML and DL algorithms.

Figure 7 and Table 6 display the performance metrics utilized by the researchers to evaluate the methodology. The two most commonly utilized performance indicators are accuracy and precision. To demonstrate the efficacy of an IDS developed using ML or DL techniques, it is essential to consider Accuracy, Recall, Precision, and F-measure as the primary performance metrics, among others, to showcase its capability in detecting intrusions. MCC, FNR, TPR, and FAR can also, help to evaluate the performance of the IA algorithm in IDS.

Figure 7

Figure 7. Evaluation metrics.

Table 6

Table 6. Datasets and performance evaluation metrics.

7.2 Research challenges for ML and DL-based intrusion detection systems

This subparagraph emphasizes the research obstacles in the field of IDS.

1. This research has revealed the lack of a current dataset that accounts for new attacks on modern networks. Most proposed algorithms failed to detect zero-day attacks because these models were not appropriately trained with multiple attack kinds and patterns (Vangipuram et al., 2020). Testing and validating an IDS model using a dataset that includes historical and recent incidents is critical. If the dataset has a definition for the maximum number of attacks, the ML/DL model can better understand patterns and ultimately provide security against a wider range of invasion scenarios. On the other hand, building a dataset requires a large investment of time, energy, and the specialized knowledge of many experts. As a result, one of the challenges of IDS research is systematically developing a contemporary dataset with enough examples of practically all attack types.

2. Unbalanced data reduces the detection rate: Most proposed IDS systems have detection accuracies for specific attack types lower than the model's overall accuracies. It is related to the uneven distribution of data. On average, less common attacks are more difficult to detect than more prevalent ones. There are two approaches to this problem. The first is to assemble a broad and accurate data set. Finding effective strategies to increase the number of minority attack occurrences to build a more representative dataset is one option. SMOTE, RandomOverSampler, adaptive synthetic sampling technique, and other researchers have recently devised strategies for lowering the dataset imbalance ratio to increase performance (Wu et al., 2020; Ali Hussein Ali, 2024a). However, there is still an opportunity for advancement in this subject, necessitating further research.

3. Complex models use many resources: Most IDS strategies reported in the literature are based on complex models that require a large amount of time and computer resources. This may cause the CPU to perform unnecessary work, lowering the IDS's efficacy. Using a powerful computer with advanced capabilities may reduce processing time and effort, but at a large cost. As a result, a viable technique for selecting the most significant attributes while minimizing computational and processing overhead is required. Despite academic efforts to examine alternative optimization algorithms for feature selection, there is undeniably a need for improvement. More research will be necessary to develop a suitable feature selection optimization technique.

7.3 New trends in ML and DL-based intrusion detection systems

We present novel trends in ML and DL applied to intrusion detection systems, focusing on some key elements to broaden their scope. Hence, we propose future directions to effectively detect intrusions in real-world environments.

1. The IDS is critical to the security of any network. A recent study has shown that the approach cannot consistently detect zero-day attacks due to a substantial FAR (Wu et al., 2020). A current, thorough, well-balanced dataset can help with this goal. Such a field of inquiry can assist researchers in developing a comprehensive IDS framework capable of protecting networks from any potential attack. The IDS model's capacity to identify zero-day threats and reduce false positives will be enhanced.

2. The solution to complex models: The success of DL-based IDSs is primarily due to the effectiveness of deep feature learning in identifying malicious intrusions (Vangipuram et al., 2020). Processing power, storage space, and time are all required to execute DL algorithm-based models. Because of the intricacy of these systems, implementing IDS in such a dynamic environment is difficult. One option to address these challenges is to use high-performance GPUs to analyze massive datasets quickly and efficiently. Graphics processing units, on the other hand, are relatively expensive. As a result, efficiency and affordability are opposed. One approach for training models at a cheaper cost is to look into GPU platforms or cloud-based services (Salvakkam et al., 2023). This problem can be handled by implementing effective and intelligent feature engineering to reduce the complexity of DL algorithms. A smaller number of features can achieve the same detection accuracy as all available data. As a result, less processing power is required in real-time, and the model's complexity is reduced.

3. Use of DL algorithms: Recently, it has been suggested that DL-based algorithms be employed in IDS architecture (Yi et al., 2023). The investigation into DL's potential use of IDS is in its early stages. Certain DL algorithms have been researched, with many of them being put to good use in the formulation of acceptable solutions. Some DL approaches, such as Deep Reinforcement Learning (Alavizadeh et al., 2022), require more research before being utilized to propose an adequate IDS solution. Recently, Nguyen et al. investigate DRL approaches developed for cyber security Nguyen and Reddi (2023). They touch on different aspects, including protecting cyber-physical systems by DRL-based security methods, defense strategies against cyberattacks using multiagent DRL-based game theory simulations and autonomous intrusion detection techniques.

Alternatively, researchers can combine DL feature extraction and ML classification. As a result, the proposed model will become more straightforward.

4. In the context of intrusion detection systems, adversarial attacks may significantly affect the performance of such systems (Martins et al., 2020; Alhajjar et al., 2021; Khalid Albulayhi, 2023).

An adversarial attack is an attack meant to fool the target ML model regardless of the type of ML model being used. This means that the adversarial attacks try to bypass a system so that the affected system behaves in an unwanted manner (Miller et al., 2020). Essentially, adversarial attacks and the subsequent degradation in the performance of intrusion detection system could lead to enormous risks associated with cybersecurity. This comes in the form of incorrect classifications of network traffic, which results in many genuinely malicious activities going undetected. Researchers have shown that most intrusion detection systems are quite susceptible to adversarial attacks (Frank and Nancy, 2019). However, it has been explained that there exists an inherent resiliency in deep learning systems, that is, proper tuning may prevent against some adversarial attacks which are based on poisoning the training data (Abou Khamis et al., 2020). The most prominent method in which adversaries disrupt intrusion detection systems is through adversarial sampling. This is where the adversary specifically crafts disturbances, ensuring that the intrusion detection system fails to detect malicious instances (Ángel Luis Perales Gómez et al., 2021). In conjunction with this method, the dynamic environment in which intrusion detection systems operate, along with the vast number of possibilities for adversarial attacks, will mean that research will constantly be required to ensure the security of these systems. Indeed, generative artificial intelligence models such as ChatGPT can be used to disrupt the functionality of security tools such as IDS via automated hacking and various attack scenarios (Charfeddine et al., 2024). To counter the threat of these adversarial attacks, some defense strategies have been proposed, including adversarial training, preprocessing techniques, the addition of extra networks and digital watermarking (Szyller et al., 2021; Charfeddine et al., 2022). Adversarial training aims to strengthen ML/DL models by including adversarial samples during the training process. Preprocessing techniques involve carefully planned data transformations that limit the impact of adversarial perturbations. Adding more networks uses external models to identify samples that have never been seen before, improving the system's ability to detect adversarial attacks. Embedding digital watermarks during training allows models owners to identify them in the event of an adversarial attack. Furthermore, additional methods for detecting adversarial samples have been proposed, such as using subnetworks as detectors or using confidence scores to identify out-of-class data. Moreover, defensive techniques based on generative adversarial networks (GANs) have been developed to enhance the robustness of IDSs against certain types of attacks (Alotaibi and Rassam, 2023).

5. There are certain additional challenges with IDSs, particularly in terms of system reliability. Cybersecurity specialists now usually agree on IDS guidelines, so the system's forecasts should be comprehensible. As a result, their increasing sophistication is a significant disadvantage given the high accuracy levels achieved by such systems; they cannot include information about why they make decisions. As a result, some details about the causes that underpin IDS forecasts must be provided, as well as some clarification on the intrusions discovered by cybersecurity professionals. Few studies have described these new trends and developments in IDSs (Younisse et al., 2022; Pande and Khamparia, 2023).

These research works propose systems based on Shapley additive explanations (SHAPs) to overcome these drawbacks and provide a more accurate interpretation of IDS. SHAP offers a solid theoretical foundation for both shallow and deep-trained models. The authors in Pande and Khamparia (2023) define a system that delivers both local and global interpretations to improve the generalizability of all IDSs. Local descriptions include knowledge that each function value reduces or increases the anticipated likelihood. Global interpretations examine the relationships between the importance of functions and specific types of threats by extracting key attributes from each IDS. These systems lead to a better understanding of IDS forecasts and ultimately aim to instill confidence in IDSs for cyber-users. They enable cybersecurity professionals to better detect cyber attacks.

6. DL models cannot perform well when small training datasets are used, or when there is a discrepancy or inconsistency in data distribution between training and test data. Hence, the quantity and quality of features are important in improving classification because they help the DL model understand their significance and correlation. If only a few features are used, classification quality suffers, resulting in overfitting; if too many are used, generalization suffers, resulting in underfitting. Deep Transfer Learning was introduced to address the issues, which are primarily caused by data scarcity and inconsistency (Kheddar et al., 2023; Latif et al., 2024). It is based on the principle of feeding target model knowledge from a pre-trained source model, so that the target model begins with patterns learned while completing a related task of the source model rather than starting from the beginning. ML and DL techniques include multi-task learning, domain adaptation, multiple and/or cross-modalities, and the use of multiple datasets. They can be viewed as a method of fusing information from multiple sources to improve the overall performance of the model. ML and DL enable more effective information fusion and can produce better results than training models from scratch. These advantages motivate the development of ML and DL-based IDS models to solve many problems in a wide range of applications and to detect attacks and intrusions that traditional methods may miss.

7. Cybersecurity research is vulnerable to a multitude of problems with infrastructure, making it more difficult to operate in a real-time environment due to a variety of concerns, particularly processing overhead. Furthermore, as technology advances, there is a possibility of attacks on polymorphic systems, with new attacks emerging each time. Traditional IDS databases do not include these new attacks. Thus, real-time intrusion detection systems are required to detect and prevent attacks as soon as they occur. This can be accomplished by continuously monitoring system activities and detecting intrusions in real-time.

It is undeniable that Machine Learning gained popularity in IDS due to its ability to detect unknown threats. However, classical machine learning-based algorithms are too slow to handle many Gbps of traffic and thus cannot be used in high throughput networks. A possible solution to this problem is to propose two levels of classifiers: one for per-packet detection and another for per-flow detection to compensate for performance and accuracy. The level 1 classifier extracts some selected features from the packet first, allowing for faster classification and real-time attack detection. The level 2 classifier only works with flows not classified by the level 1 classifier (Seo and Pak, 2021).

7.4 Practical and managerial implications of ML and DL-based intrusion detection systems

Practically, ML and DL-based IDS may assist cybersecurity teams focus their attention on genuine threats, reducing the risk of alert fatigue and allowing for more effective incident response strategies. Organizations may mitigate the impact of breaches and potential damages by identifying and containing security incidents as soon as they occur. This scalability is critical for organizations operating in dynamic and changing cyber threat landscapes. While ML and DL-based IDS may offer advanced capabilities for detecting and mitigating cyber threats, effective implementation necessitates careful consideration of managerial implications.

In reality, integrating ML and DL-based intrusion detection systems necessitates a significant investment of resources, including machine learning experts, computational infrastructure, and ongoing maintenance. Managers must allocate adequate resources to ensure the effectiveness and efficiency of these systems. Furthermore, managers must invest in training programs to improve the skills of cybersecurity personnel in ML and DL techniques. This includes understanding how to effectively interpret and act on the systems' outputs. In addition, executives must carefully evaluate vendors that offer ML and DL-based IDS solutions. The robustness of the algorithms, scalability, interoperability with existing systems, and experience dealing with emerging threats are all important considerations.

In addition, compliance requirements such as GDPR, HIPAA, or industry-specific regulations may necessitate certain data handling and privacy considerations when implementing ML and DL-based IDS. Further, compliance requirements such as GDPR (Mohammad Amini et al., 2023), HIPAA (Humphrey, 2021), or industry-specific regulations may require particular data handling and privacy considerations when implementing ML and DL-based IDS. Managers must ensure that these systems follow applicable regulations and standards. While ML and DL-based IDS provide advanced threat detection capabilities, they may introduce new risks such as model bias, adversarial attacks, and interpretability challenges. They must effectively assess and mitigate these risks to ensure that the IDS is reliable and trustworthy. Managers should evaluate the performance of various algorithms and feature sets to improve detection capabilities. Moreover, they should ensure that these systems are seamlessly integrated into the network infrastructure to reduce latency and enhance responsiveness. By effectively addressing these factors, organizations may leverage these advancements to improve their cybersecurity posture and mitigate risks.

8 Comparison with related studies

Several scientific studies on IDS were published in recent years. To evaluate our research, we included a Table 7 comparing our findings to those of other studies. This table serves to delineate the distinctions between our proposed methodology and the findings from existing surveys.

Table 7

Table 7. Comparison with other similar review articles.

Table 7 compares our proposed study to other surveys on ML and DL-based IDS. We noticed that all of the surveys investigated IDS classification and intelligent techniques in IDS. However, the specific techniques used may vary between surveys. Our study, along with those conducted by Saranya et al. (2020); Si-Ahmed et al. (2023), discussed detection methods in IDS and reviewed IDS-related datasets. Only Saranya et al. (2020), Maseno et al. (2022), and Yi et al. (2023) conducted surveys on specific IDS applications such as IoT, Smart City, fog, and Big Data. Our study, along with those of Maseno et al. (2022) and Yi et al. (2023), examined the metrics and indicators used in IDS. However, only our paper with Haji and Ameen (2021) have focused on the various types of attacks in IDS. Nonetheless, we have not addressed the challenges associated with unbalanced data categories or the processing of high-dimensional mass data in IDS, as Si-Ahmed et al. (2023) and Yi et al. (2023) did in their surveys.

Except for the study by Yi et al. (2023), all of the surveys tackled research challenges related to Machine Learning (ML) and Deep Learning (DL)-based IDS. Our research, as well as that of Haji and Ameen (2021) and Maseno et al. (2022), have identified and discussed new trends in machine learning and deep learning-based IDS. The practical and managerial implications of ML and DL-based IDS have been discussed in our work, in addition to those by Saranya et al. (2020), Maseno et al. (2022), and Yi et al. (2023). Only our survey, as well as those of Haji and Ameen (2021) and Si-Ahmed et al. (2023), compared their findings to other related studies or approaches. Overall, the comparison table address various aspects of IDS. The specific focus and depth of coverage may differ between surveys, highlighting various perspectives and areas of emphasis in the field of IDS research. According to the findings, while all surveys provide valuable insights, our survey stands out because it covers a wide range of concerns. Furthermore, the surveys by Saranya et al. (2020), Maseno et al. (2022), and (Si-Ahmed et al., 2023) are noteworthy studies.

9 Conclusions

This paper presents a comprehensive survey of modern ML and DL-based intrusion detection algorithms, including recent solutions, datasets, metrics, indicators, and detected attacks, to provide valuable insights to researchers in this field. A systematic approach was used to select relevant and recent articles about AI-based IDS. The concept of IDS was extensively discussed, along with its various classification schemes based on the reviewed literature. Furthermore, each article's methodology was examined and their strengths and weaknesses were highlighted regarding intrusion detection capabilities and model complexity. This analysis revealed that recent developments favor DL-based approaches for improving the performance and effectiveness of IDS by increasing accuracy rates and decreasing false alarm rates. DL schemes have outperformed ML-based methods' ability to independently learn features and fit complex models. However, the complexity of DL algorithms requires significant computing resources for processing power and storage capabilities, posing challenges for real-time implementation of intrusion detection systems. Furthermore, the study found that 41% of proposed methodologies were tested on outdated datasets such as KDD Cup'99 and NSL-KDD, limiting their effectiveness in detecting modern network attacks in real-time environments. Addressing these challenges is important for meeting real-time requirements and enhancing IDS performance. It is crucial for AI-based IDS methods to be regularly tested with updated datasets to achieve accurate intrusion detection. Thus, the paper effectively tackled these challenges and projected future developments in ML and DL-based IDS systems. Besides, the proposed survey is evaluated by comparing it to other studies, identifying differences and similarities between our suggested methodology and existing surveys. According to this comparison, while all surveys provided useful information, ours stood out for addressing an extensive variety of concerns.

We noticed through this research study that there are still research gaps, such as improving model performance for specific attacks in real-world environments and finding efficient solutions to reduce complexity. Therefore, future research could focus on developing a lightweight and effective IDS framework that relies on less complex DL algorithms and efficient detection mechanism.

Author contributions

AA: Formal analysis, Writing – original draft, Writing – review & editing. BA: Formal analysis, Resources, Writing – review & editing. MC: Conceptualization, Data curation, Writing – review & editing. BH: Methodology, Project administration, Writing – review & editing. FA: Methodology, Validation, Writing – review & editing. AA: Investigation, Methodology, Writing – review & editing. AH: Funding acquisition, Investigation, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This research has been supported by the Ministry of Higher Education and Scientific Research of Tunisia under grant agreement number LR11ES48 and the UK Engineering and Physical Sciences Research Council (EPSRC) Grants Ref. EP/T021063/1, EP/T024917/1.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Abou Khamis, R., Shafiq, M. O., and Matrawy, A. (2020). “Investigating resistance of deep learning-based ids against adversaries using min-max optimization,” in ICC 2020–2020 IEEE International Conference On Communications (ICC) (IEEE), 1–7. doi: 10.1109/ICC40277.2020.9149117