Deep learning in distributed denial - of - service attacks detection method for Internet of Things networks

: With the rapid growth of informatics systems ’ technology in this modern age, the Internet of Things ( IoT ) has become more valuable and vital to everyday life in many ways. IoT applications are now more popular than they used to be due to the availability of many gadgets that work as IoT enablers, including smartwatches, smartphones, security cameras, and smart sensors. However, the insecure nature of IoT devices has led to several di ﬃ culties, one of which is distributed denial - of - service ( DDoS ) attacks. IoT systems have several security limitations due to their disreputability characteristics, like dynamic commu - nication between IoT devices. The dynamic communications resulted from the limited resources of these devices, such as their data storage and processing units. Recently, many attempts have been made to develop intelligent models to protect IoT networks against DDoS attacks. The main ongoing research issue is developing a model capable of protecting the network from DDoS attacks that is sensitive to various classes of DDoS and can recognize legitimate tra ﬃ c to avoid false alarms. Subsequently, this study proposes combining three deep learning algorithms, namely recurrent neural network ( RNN ) , long short - term memory ( LSTM )- RNN, and convolutional neural network ( CNN ) , to build a bidirectional CNN - BiLSTM DDoS detection model. The RNN, CNN, LSTM, and CNN - BiLSTM are implemented and tested to determine the most e ﬀ ective model against DDoS attacks that can accurately detect and distinguish DDoS from legitimate tra ﬃ c. The intrusion detection evaluation dataset ( CICIDS2017 ) is used to provide more realistic detection. The CICIDS2017 dataset includes benign and up - to - date examples of typical attacks, closely matching real - world data of Packet Capture. The four models are tested and assessed using Confusion Metrix against four commonly used criteria: accuracy, precision, recall, and F - measure. The performance of the models is quite e ﬀ ective as they obtain an accuracy rate of around 99.00%, except for the CNN model, which achieves an accuracy of 98.82%. The CNN - BiLSTM achieves the best accuracy of 99.76% and preci - sion of 98.90%.


Introduction
Distributed denial-of-service (DDoS) attacks have been around for a while and have affected many systems. It started as a denial-of-service assault, in which the hacker utilized a single device to launch the attack. The first major DDoS attack was reported in 2000, when a high school student named "Michael Calce" took down major websites such as amazon.com, Yahoo, FIFIA.com, eBay.com, CNN, and Dell. Considering that Yahoo was the most widely used search engine at the time [1]. On the other hand, DDoS is when an attacker uses multiple devices to perform the attack, which is more severe than the previous one and much more complicated to detect. A new incident happened to Dyn company. The episodes of attacks that early research indicates may have reached 1.2 tbps, almost twice the amount of the flow volume of Krebs, and appear to have had a substantial impact on reducing the Dyn domain name system (DNS) service. Major names including Twitter, Amazon, Airbnb, Spotify, GitHub, and others were essentially knocked down when the DDoS assault disabled DYN's DNS anycast server. The attack began on 21 October 2016 and came into action instantly. For a short period, Dyn regained control, but a second attack retook several hours. A few days later, in a statement, Dyn verified what had previously been widely rumored: "We can confirm that a substantial number of attacks have originated in Mirai-based botnets" [2]. We are likely to see more of this occurring in the future since Mirai's source code has been published. Someone named "Anna-senpai" released the code on the Hack forums website, claiming that she/he had previously been capable of creating botnets as large as 380,000 devices solely through the use of telnet connections, but that this was becoming more difficult due to increased scrutiny by the security community. "ISPs have progressively shut down and cleaned up their actions after Kreb [sic] DDoS. Today, the maximum draw is about 300,000 bots and is decreasing" [3].
The Internet of Things (IoT) is a term that refers to a developing network of physical things that are linked to the Internet. It enables the transition of Internet-connected devices into a connected ecosystem with digital data available from any location and at any time. IoT devices are physical items ranging in size from small to colossal equipment that interact with one another without human involvement through the Internet [4]. Cisco estimates that 50 billion devices are now linked to the Internet [5]. Figure 1 illustrates many instances of IoT devices. These devices are fundamentally resource-restricted; they have a finite amount of memory, limited processing capacity, and limited computing power. Numerous supporting technologies, including wireless sensor networks, radio frequency identification, and cloud computing, have evolved as critical components of the IoT paradigm's development.
DDoS attacks come in a variety of shapes and sizes. These attacks are handy and use the same methods as DoS attacks, but on a much bigger scale and a much more sophisticated nature, making protection impossible and leaving behind severe and excruciating consequences to informatics systems and networks [6,7]. Thousands of hacked devices are remotely controlled and are used to assault and immobilize the victim. Figure 2 shows the structure that DDoS takes. The exploited IoT devices are turned into slaves that assess the hacker to form an army-like that sends false requests to a targeted victim, resulting in a complete shutdown. As illustrated in the figure, DDoS can be used in various ways. Any computer connected to the Internet presents an attractive opportunity for compromise, transforming them into zombies without the owner's knowledge. Zombies assist in DDoS attacks via backdoors, worms, and Trojan horses. By sending them via email, storing them on vulnerable websites, or in the form of advertisements, they can even inject a picture and hide the Trojan in the pixels so that it appears to be a typical picture when opened [8]. Once opened, the software, which is primarily written in JavaScript, will download payloads onto the user's undetected devices.
The term "deep learning classification" refers to the problem associated with predictive modeling when a class label is anticipated for a given input data sample. Classification categorizes observed data into a predetermined number of groups [9]. The primary objective of a classification task is to determine which class new data will belong to. Classification algorithms try to determine the mapping function (f) from input variables (x) for discrete or categorical output variables (y) [10]. In this case, (y) denotes the anticipated category indicated by the mapping function. For example, the classification algorithm can forecast if the price will be higher or lower than the item price when selling a vehicle. A continuous value for classification models and output classes should often be projected as the likelihood of a specific occurrence. Probabilities may be thought of as the probability or confidence associated with a particular instance of each class. Selecting the most likelihood class mark may convert a projected probability to a class value [11].
The implications of DDoS assaults are undoubtedly severe and will only continue to worsen as the number of IoT devices and their software systems grows. Even though many solutions have been developed to identify and prevent this assault, which is mainly targeted at IoT devices, the danger continues to exist and is now more significant than ever [8,9]. An excellent example of this is using a machine learning pipeline, which collects traffic and determines whether it is a benign or malicious assault [12,13].
The purpose of this study is to utilize deep learning as a way to manage DDoS assaults using four algorithms: the recurrent neural network (RNN), the long short-term memory-RNN (LSTM-RNN), the convolutional neural network (CNN), and the CNN-bidirectional LSTM (CNN-BiLSTM). Based on their performance, a comparison between the four algorithms will be conducted using a confusion matrix. Deep learning in DDoS attacks detection method for Internet of Things networks  3 2 Related work Many research efforts have been made to mitigate the threat of DDoS assaults in the past. Some of these studies and attempts are genuine, while others are outdated. These characteristics are consistent with the nature of today's DDoS botnets. There are many different types of DDoS malware, some of which regularly develop, making them almost impossible to detect. Some DDoS malware may hibernate on compromised devices without creating issues, giving the impression that nothing unusual is happening until the hacker launches the attack. Deep learning has developed new methods and strategies for dealing with DDoS and other problems [4,10]. This section presents some of the most recent and effective work for DDoS detection in the IoT environment.
The research of [14] has provided a software-defined networking-based classification using three models, namely, a gated RNN "as their primary model," and compared it with VanilaRNN, support vector machine (SVM), and deep neural network (DNN) for the NSL-KDD dataset, and compared the proposed model with DNN, SVM, and naïve bayes (NB) trees using the intrusion detection evaluation dataset (CICIDS2017). This research introduces a mitigator or filter; the filter shall receive the traffic before it reaches the software-defined networking (SDN) switch as an SDN controller, consisting of three naib parts: flow collector, anomaly detector, and anomaly mitigator. The interface gated recurrent units-RNN (GRU-RNN) has achieved an accuracy of 0.89 utilizing the NSL-KDD dataset and 0.99 using CICIDS2017.
The research of ref. [15] implemented two deep learning models in the CICDDoS2019 dataset. The models are DNN and LSTM, introducing more refined data by using feature selection, leaving them with three labels to distinguish the data from "UDP, SYN, and UPD-lag. Each on hold a dataset with two labels "standard and the type of the attack. It would appear that the average accuracy obtained by both models was somewhat comparable as it amounted to 0.999 when run in a batch size of 32 and 10 epochs for all experiments.
The research of ref. [16] has experimented with just LSTM in the CICIDS2017 dataset. However, the main goal is to determine which tuning is best to get the most accuracy. They tried different hyperparameters, such as activation function, loss function, optimizer, etc., and a different number of LSTM layers. Their experiment used from one layer to six layers, abbreviated as L1-L6. As observed from the results of this study, the optimal hyperparameter tuning that achieved the highest accuracy is in Table 1.
The research of ref. [17] presents a deep learning algorithm in which they merged RNN with CNN as a hybrid detection system. The study used the realistic cyber defense dataset (CSE-CIC-IDS2018) dataset and three more algorithms, namely, Logistic Regression, Extreme Gradient Boosting, and Decision Tree, to be compared with the newly introduced hybrid convolutional recurrent neural network-based network intrusion detection system based on their performance. All models have the same building structure with tenfold cross-validation. Their combination outdid the other models, which yielded an accuracy of 0.97 compared to other models, of which the highest was decision tree (DT) with an accuracy of 0.88. In the research of [18], the author used a set of deep and machine learning algorithms to detect anomalies in IoT devices using the CICIDS2017 dataset. The used models are deep convolutional neural networks (DCNN), multi-layer perceptron (MLP), LSTM, CNN + LSTM, SVM, NB, and random forest (RF). Table 2 summarizes the deep learning models and their corresponding results in the related work. Recent attempts to develop intelligent models to protect IoT networks against DDoS attacks have achieved noticeable success. However, developing a model capable of protecting the network from various types of DDoS attacks without affecting legitimate traffic represents the main research gap. In addition, testing these models in real-world environments for such types is challenging. As presented in Table 2, different datasets such as NSL-KDD, CICIDS2017, CSE-CIC-DS2018, and CICDDoS2019 are prepared for testing the DDoS defense models. The proposed defense models use machine learning and deep learning algorithms. The results show that, mainly, deep learning-based methods achieve better performance than machine learning-based methods, especially for well-known attacks, e.g., LSTM achieves an accuracy of 99.96% in the CICDDoS2019 dataset, and CNN + LSTM achieves an accuracy of 97.16% in the CICIDS2017 dataset. Nevertheless, new types of attacks represent a critical challenge for both approaches.

Methodology
We used three deep learning algorithms, namely CNN, RNN, and LSTM, to develop a CNN-BiLSTM model. The description of the RNN, LSTM, and CNN is provided in the following subsections. The flow of the experiment is shown in Figure 3. We prepared the data by applying various data checking and filtering processes in the first stage. First, we found that the dataset has many NaN and infinite values, so we replaced them with the mean. In that manner, the labels are encoded using Label_Encoder. Then, the data is split using a train-test with 67 and 33%, respectively, and a random state of 44. In the following stage, before feeding the data to the models for processing, the data is reshaped to match each model accordingly. Finally, the model performance evaluation result is obtained based on the confusion matrix, accuracy, precision, recall, and F-measure.

Dataset
The CICIDS2017 dataset includes benign and up-to-date examples of typical attacks, closely matching realworld data (Packet Capture, PCAPs). In addition, it contains the results of a network traffic analysis performed through CICFlowMeter, featuring flows labeled according to the time stamp, source and destination IP addresses, source and destination ports, protocols, and attack [13]. An overview of the dataset includes several types of DDoS attacks. They generated different attack datasets on Friday, 7 July 2017.
BoT ARES is included in the Morning dataset. The Afternoon dataset is made up of independent port scans and DDoS assaults. The dataset's structure used in this research combined both datasets of "Friday-Working Hours-Afternoon" and "Wednesday-Working Hours" into one dataset holding a shape of 478406,79. The features are 78, and an addition of a class label. There are six class labels: DDoS, Benign, DoS GoldenEye, DoS Hulk, DoS slowhttptest, and DoS slow loris. The CICIDS2017 dataset provides a more realistic detection of DDoS attacks. The CICIDS2017 dataset includes up-to-date examples of typical attacks, closely matching realworld data of PCAPs. Hence, it is extensively utilized to detect network attacks, including DDoS attacks [19].

Deep learning algorithms
The structure of each model is pretty much the same, with some minor changes to the models when necessary. Table 3 shows the shared parameters.
RNN: RNN is a neural network in which the output of previous phases is used as an input. In a typical neural network, all inputs and outputs are independent. Still, preliminary words are required for tasks such as predicting the following expression of a phrase, which results in the recall of the last comments. The problem was then resolved using an RNN through a concealed layer [20]. The basic structure of the RNN used in this research is shown in Figure 4.  The hidden state, which retains specific sequence data, is the RNN's primary and essential component. In this research, we used a simple RNN. Thus, even this very basic design is robust from a symbolic standpoint. However, representative capacity is not the only issue in practice. It is critical to emphasize that these impressive findings on representation in no way imply that we can learn such pictures from data in an acceptable period [21]. The results obtained with the model can very much support that claim. RNN utilizes hidden layers and is best represented by the following equation: where h t is the hidden state, t is the time step, σ is the activation function of the hidden layer, x is the input, W is the weight, and b is the bias.
CNN: CNNs are based on neural networks that are constructed on the foundation of neural networks. Anatomy of CNN's layers: convolutional, pooling, and fully connected. This layer is just a neural network as previously covered. The CNN also includes two important components: the activation function and the dropout layer. The CNN algorithm performs two main processes on convolutional and max-pooling layers: convolution and sampling [22,23]. Figure 5 shows the basic CNN architecture.
Each neuron gets information from the xnx kernels part of the previous layer, known as the local receptive field [22].
where x i j , is the current output, σ is the activation function of the hidden layer, c is the input, w is the weight, and b is the bias.
LSTM: Hochreiter and Schmidhuber were the first to suggest LSTM as a special type of RNN [23]. The LSTM standard architecture has an input layer, a recurring LSTM layer, and an output layer. The input layer is linked to the layer LSTM. Formula (3) represents the LSTM-RNN first layer's decision-making function f t . The decision is related to using or discarding the provided data, − h t 1 , from the previous cell's processed output as follows: where − h t 1 is the data provided by the previous process, t is the time step, σ is the sigmoid function of the hidden layer, x is the input, w is the weight, and b is the bias.
Recurring LSTM connections are directly connected from cell output units to cell input units, input gates, output gates, and gates. The cell output units are also linked to the network output layer [24]. The following formula can calculate the total number of N parameters in the typical LSTM network with one cell  in each memory block (without taking account of the distortions) [25]. Figure 6 shows the executed layers of the LSTM model.

CNN-BiLSTM:
In this model, we used the two powerful RNN and CNN algorithms and merged them into CNN-BiLSTM. Initially, it was suggested to remove the RNN restrictions. The idea is to split the state neurons of a normal RNN into two parts: one for the positive time (front) and one for the negative time (reverse) (backward states) [23]. Foreign state outputs are not linked to reverse state inputs, and vice versa. This results in the general structure of the BiLSTM shown in Figure 7, which has three stages, as suggested by ref. [26].
The introduced CNN-BiLSTM model is presented in Figure 8. The convolutional layer creates features from the input data. The dropout layer randomly drops out nodes during training. This layer helps to regulate the training process, improve generalization error, and reduce overfitting causes. The CNN then   applies the Maxpooling layers to reduce the spatial resolution of the features [22]. Adding the CNN for extracting critical patterns and producing high-quality features. Then, the BiLSTM functions forward and backward and can detect backward patterns [23].

Evaluation
The performance of the classifiers was evaluated using a confusion matrix and the four most commonly used evaluation metrics, namely, accuracy, F-measure, recall, and precision [27]. The evaluation metrics are measured based on the calculations of True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN) notions [28]. They are further explained as follows: Accuracy: The proportion of correct forecasts from the total number of forecasts.
Recall: It is also known as sensitivity, a proportion of the total number of instances found.
Precision: It means how precise the model was in predicting and classifying the data and is represented as follows: F1-score: The term F1-score is self-explanatory. It indicates the classifier's precision regarding the number of successfully classified examples and their robustness as follows: 4 Results and discussion In this work, the proposed CNN-BiLSTM model is used to detect DDoS attacks in IoT networks. Three additional models, LSTM, RNN, and CNN, are tested to verify the performance of the proposed CNN-BiLSTM model. These models have been implemented in the Python programming language under the Kaggle data science online system. The CICIDS2017 dataset is used for training and testing the four models. The confusion matrix, accuracy, precision, recall, and F-measure have been implemented to assess the performance of the models. Table 4 and Figure 9 give an overview of how each classification model performed.
It is to be observed by looking at Figure 9 that the CNN-BiLSTM has outperformed all other classifiers. Both CNN and RNN are strong as individuals, as demonstrated by the results in Table 4. The combination of them indeed made a noticeable difference. The CNN as an individual has shown the lowest accuracy, but it obtains high precision and maintains a low error rate. The confusion matrix of each classifier obtained from the algorithm outputs is presented in Figure 10.  The main aim of this study is to combine three deep learning algorithms RNN, CNN, and LSTM in a CNN-BiLSTM hybrid model. Then, test the four deep learning algorithms and compare their performance in terms of detecting DDoS attacks in an IoT environment. With an accuracy of 99.76%, the hybrid CNN-BiLSTM model outperforms the rest of the deep learning models and machine learning. The CNN deep learning model is the least performant in terms of accuracy based on the dataset utilized. Except for CNN, the accuracy of the other three deep learning techniques, LSTM, RNN, and CNN-BiLSTM, is more than 99.50%, which outperforms most of the reviewed methods in Table 2, including deep learning and machine learning algorithms.

Conclusion and future work
The DDoS assaults constitute a danger to everyone on the Internet and are challenging to identify because of the attackers' spoofing technology. Based on DDoS history, the recognized Mirai and Bashlite botnets will not be the final and most potent botnets since they are able to overcome the traditional DDoS detection and mitigation techniques, such as adding more bandwidth or traffic extraction. Cyberattacks endanger IoT networks that are still inherently vulnerable and need substantial resistance abilities. This article presents a proposed CNN-BiLSTM hybrid model that takes the advantages of CNN, RNN, LSTM, and BiLSTM and joins them into one model. The CNN-BiLSTM model provides an average accuracy of 99.76%, which shows that DDoS assaults can be handled efficiently by the CNN-BiLSTM deep learning model. Moreover, the CNN-BiLSTM model has shown the most satisfactory results in all metrics as compared with CNN, LSTM, and RNN. Nevertheless, the findings acquired from the other three classifiers must not be overlooked since they achieve an average accuracy of 99.16%. This study extensively examined IoT network construction and provided several potential causes for its insecure nature. It also demonstrated prior research and addressed the gaps in it and the nature of DDoS assaults. The study proposed the CNN-BiLSTM model for DDoS detection, and the results show that the CNN-BiLSTM model surpasses other tested models. This result indicates the ability of the model to perform in real-world IoT network systems. The main limitation of this work is the testing reliability due to the unavailability of a realistic testing platform. Future work will focus on instigating the bottleneck of IoT network systems related to the vulnerability to DDoS attacks.