1. Introduction
The transformation of cities into smart cities is on the rise, where technologies such as the Internet of Things (IoT) and cyber–physical systems (CPS) are connected through networks for the better provision of quality services to citizens [
1]. The smart city concept refers to urban systems that are integrated with information and communication technologies (ICTs) to improve city services in terms of monitoring, management, and control to be more efficient and effective [
2]. A smart city contains a huge number of sensors that continuously generate a tremendous amount of sensitive data such as location coordinates, credit card numbers, and medical records [
3]. These data are transmitted through a network to data centers for processing and analysis so that appropriate decisions, such as managing traffic and energy, can be made in a smart city [
4]. The resource limitations of technological infrastructure expose smart cities to cyber attacks [
5]. For instance, sensors that generate data and devices that handle the data in a smart city have vulnerabilities that can be exploited by cybercriminals. Consequently, citizens’ privacy and lives can be at risk when collected data for analysis and decision making are manipulated, which makes people intimidated by smart cities [
1].
A smart city environment collects a tremendous amount of private and sensitive data and depends on ICT, which makes smart cities target for different cyber attacks, such as distributed denial of service (DDoS), using IoT devices by infecting them with bots and launch an attack against a target [
6,
7,
8,
9]. Cyber threat intelligence (CTI) can provide secure environments for smart cities, where it can rely on cloud services to monitor possible threats in real time and take appropriate prevention measures without human intervention [
10,
11,
12,
13,
14,
15]. Moreover, CTI can provide a light security mechanism, as it is not implemented on smart city devices; rather, it monitors attacks through the cloud to obtain information about recent threat behavior and indicator of compromise (IoC), and it reports this information to connected smart city systems. Different techniques and machine learning (ML) models have been proposed to analyze cyber threats for CTI such as deep learning (DL) models [
16,
17], random forest (RF) [
18], and K-NN [
19]. Nevertheless, artificial intelligence (AI)-based models can have a high false-positive rates (FPRs) and low true-positive rates (TPRs) if the attack traffic is not profiled and modeled well enough [
20]. This limits real-time classification efficiency and degrades smart city network security. To address this issue, improve threat analysis, and lower FPRs, we propose a hybrid DL model that is based on a convolutional neural network (CNN) and quasi-recurrent neural network (QRNN). The proposed model can automatically learn spatial features using CNN and temporal features using QRNN without human intervention. The CNN model can automatically select the relevant features from the dataset and reduce the irrelevant features to improve classification performance [
21]. For cyber threat analysis, several works have shown the efficiency of CNN for feature selection, such as [
20,
22]. The QRNN model performs computation in parallel, which improves computation time while maintaining sequence modeling [
23]. Thus, this hybrid model (CNN–QRNN) can help improve real-time analysis in CTI while providing a high accuracy and low FPR. Therefore, the proposed model can improve CTI performance for smart cities. We evaluated our proposed model with two IoT network traffic datasets. The evaluation results demonstrate the effectiveness of our proposed model. The main contributions of this study are summarized as follows:
We propose a hybrid DL model that consists of QRNN and CNN to improve cyber threat analysis accuracy, lower FPR, and provide real-time analysis.
We evaluated our proposed model on two datasets that were simulated to represent a realistic IoT environment.
The rest of this paper is structured as follows. In
Section 2, we discuss related work by comparing and analyzing different threat classification schemes that have been proposed in the literature. The proposed model is presented in
Section 3. The implementation of the proposed model is discussed in
Section 4, the experiment results and analysis are presented in
Section 5, and conclusions are presented in
Section 6.
2. Related Work
In recent years, different studies have proposed mechanisms to predict and analyze cyber attacks in smart city environments. The authors of [
24] proposed an ML-based detection mechanism that focused on classifying DDoS patterns to protect a smart city from them. In [
25], the authors studied how IoT devices can affect smart city cyber security; the authors proposed a detection mechanism that depends on the selected features to improve the threat detection for IoT. The results of the proposed system showed high accuracy, but the dataset, KDD CUP 99, did not represent the behavior of IoT network attacks. Soe et al. [
21] proposed an algorithm to improve prediction accuracy by selecting the optimal features for each type of attack in an IoT environment. The authors used ML models to evaluate the proposed feature selection algorithm, which was able to accurately predict the threats. However, the proposed algorithm selected a static set of features for each type of attack, which could be easily bypassed if exposed to the threat environment. In [
26], the authors used a DL model to select the best features for threat prediction to improve the detection time in an IoT environment. The proposed model selects a set of features that are fed into feed-forward neural networks (FFNNs) to detect cyber threats and classify threat types. However, the proposed model showed limited accuracy in predicting information theft data.
In [
19], the authors discussed how to use the ML model to rapidly and efficiently detect and classify IoT network attacks. The authors performed an experimental study by implementing various ML models and evaluating their performance. In [
27], the authors proposed a hybrid ML model to detect IoT network attacks including that of the zero-day. The proposed model mainly consists of two stages: the first stage classifies the traffic into two categories (normal or attack), and the second stage classifies the type of attacks using SVM. Similarly, in [
28], the authors proposed a hybrid ML model to detect and classify IoT network attacks in real time. The first layer of the proposed model uses a decision tree classifier to detect malicious behavior and the second layer classifies the type of attack using random forest (RF). In [
29], the authors investigated the remote-control threat of connected cars and used an ML model to predict threats. The authors proposed a proactive anomaly detection mechanism that profiled the behavior of the autonomous connected cars using a recursive Bayesian estimator. To evaluate the effectiveness of the proposed method, the authors designed a dataset for connected cars using hypothetical events routes and global positioning system coordinates, and they then modeled the data to predict the anomalies’ behavior. Lee et al. [
30] proposed a technique, based on DL models, that transforms the multitude of security events into individual event profiles. The authors discussed how anomaly-based detection can be costly since it can trigger many false alerts. Therefore, they focused on improving security information and event management system by using DL to reduce the cost to differentiate between true and false alerts. In [
31], the authors proposed a hybrid ML method to detect cyber threats. The authors focused on how to improve detection accuracy to handle an attacker’s methods to evade detection tools. To evaluate the proposed method, the authors used different datasets including KDD Cup and UNSW-NB15. In [
32], the authors discussed how to improve the threat analysis and classification, including novel attacks. The authors proposed a model based on a stacked autoencoder to enhance and automate feature selection to classify the threats.
Various scientific studies have proposed a hybrid DL model to improve threat analysis and classification. In [
33], the authors proposed an improved version of grey wolf optimization (GWO) and a CNN. In the proposed hybrid model, the first GWO model is used to select the features and the second CNN model is used for threat classification. Other studies have used a hybrid DL model that is based on CNNs and RNNs for spatial and temporal feature extraction to improve attack classification. In [
34], the authors used a CNN for feature selection since it could provide fast feature selection to support real-time analysis. For threat classification, the authors used one of the variants of the LSTM model: weight-dropped LSTM (WDLSTM). The proposed hybrid model showed good performance in terms of execution time. Vinayakumar et al. [
35] studied the effect of CNN in threat classification and intrusion detection system (IDS). The authors investigated different hybrid DL models with CNNs including CNN-LSTM, CNN-GRU, and CNN-RNN, and the model implementing CNN-LSTM outperformed the other models. Moreover, the authors highlighted that selecting a minimum set of features for threat classification degraded the performance of the classification. Therefore, DL models can perform well in terms of feature selection. In [
36], the authors proposed a hierarchical model based on CNN-LSTM. The authors used stacked CNN layers for spatial features learning using image classification and then stacked LSTM for temporal features learning. Similarly, in [
20], the authors proposed an LuNet model based on CNN-LSTM. The authors discussed how stacking LSTM layers after CNN layers could drop some of the temporal features. Thus, the authors proposed the LuNet block, which consists of LSTM layer stacked after the CNN layer, and they then stacked the LuNet block in multiple layers to improve classification performance and lower the FPR.
As shown in
Table 1, different network traffic benchmark datasets have been used to analyze the low-level IoC such as UNSW-NB15, NSL-KDD, and KDD CUP 99. For IoT attack classification, the BoT-IoT dataset has been used in multiple studies to evaluate the performance of proposed models. Different ML and DL models, such as the SVM, CNN, and LSTM, have been used to analyze threats and provide accurate results, and the CNN-LSTM hybrid model has been used in multiple studies to improve threat classification performance.
In terms of the CTI for smart cities, multiple papers, including [
24,
25], have analyzed the threats pattern based on network traffic. Additionally, in [
37], the authors proposed a trustworthy privacy-preserving secured framework (TP2SF) for smart cities; the authors used the optimized gradient tree boosting system (XGBoost) and blockchain, and they evaluated the proposed framework on two datasets: BoT-IoT and TON_IoT. DDoS is one of the challenging threats in a smart city that has been studied by different researchers, who have proposed methods to analyze IP addresses and track the sources to prevent this attack or to identify the behavior of the network when there is overload traffic. Data theft, which can be described as privacy and identity theft, is another threat that has been studied by various researchers. Data theft threats include reconnaissance, information theft, probe, R2L, and U2R, which may lead to the exposure of various vulnerabilities that can help in launching data theft attacks such as sniffing passwords and unauthorized access. Some of the proposed models for smart cites set a fixed threshold to detect attacks, which is not effective and can raise a lot of false alarms that affect the power consumption of the connected systems. In smart cities, the normal behavior of a system can change due to the increasing number of connected devices, so some researchers have achieved high accuracy but bad performance in terms of FPR.
Even though different researchers have proposed models to enhance threat classification for IoT environments, many aspects still require improvement. One of the limitations that is common between different methods is performance time. Low-level IoCs that are collected from network traffic have been used to analyze the threats in various papers to provide timely information to the CTI knowledge base and update the detection and prevention information for all systems connected to the CTI. However, to enhance classification performance, various models have multiple stacked ML model layers. Therefore, it may take time to train a model and classify threats while not taking advantage of these IoCs. Secondly, when some models are not provided with enough data for each type of threat, threat traffic cannot be profiled and modeled well enough. Consequently, ML models can have high FPRs. Furthermore, some models only provide accurate results when their system has precise details of threats. Consequently, the system is not able to recognize threats that do not have enough data for model training, which affects classification accuracy.
Moreover, we observed that few papers have addressed diverse patterns for threat analysis while considering time, accuracy, and FPR. Several works have proposed hybrid models based on the CNN and LSTM to learn spatial and temporal data. However, LSTM is computationally complex and requires a long time for analysis [
38]. The QRNN model is a type of RNN that allows for sequence modeling by implementing computation in parallel while maintaining the data’s long- and short-term sequence dependencies [
23]. We could not find a work that used the QRNN model to improve cyber threat classification time while demonstrating high accuracy. Thus, in this work, we propose a hybrid DL model for CTI for smart cities that addresses the abovementioned challenges and uses the QRNN model. The proposed hybrid model can improve threat classification accuracy and lower the FPR in a reasonable time. Therefore, it can predict different attacks to protect citizens’ data and enhance the security of smart cities.
3. Proposed Model
In this section, we discuss the proposed hybrid DL model in terms of its structure, the selected DL algorithms, and relevant theoretical concepts. The selected DL models (CNN and QRNN) can be used to classify a threat type in real time while providing a low FPR. The architecture of the proposed model is presented in
Figure 1.
A CNN is an extension of a neural network [
39] and it is effective at extracting features at a low level from the source data, especially spatial features [
40].
CNNs are used widely in image processing due to their ability to automate feature extraction [
41]. Additionally, CNNs have demonstrated their effectiveness in many fields such as biomedical text analysis and malware classification [
30]. Based on the shape of the input data, a CNN can be classified into different types including a two-dimensional (2D) CNN, which uses data such as images, and a one-dimensional (1D) CNN, which uses data such as text. A CNN consists of a convolution layer, pooling layer, fully connected (FC) layer, and activation function [
42]. The convolution layer is fundamental building block in CNNs that takes two sets of information as inputs and performs a mathematical operation with these inputs. The two sets of information are the data and a filter, which can be referred to as kernel. The filter is applied to an entire dataset to produce a feature map [
41]. Each CNN filter extracts a set of features that are aggregated to a new feature map as output [
30]. The pooling layer is implemented to reduce feature map dimensions and to remove irrelevant data to improve learning [
20]. The output of the pooling layer is fed into the FC layer to classify the data [
43].
The LSTM-RNN is one of the most powerful neural network models that is used in cyber security due to its ability to accurately model temporal sequences and their long-term dependencies [
44]. However, LSTM usually takes a longer time for model training and high computation cost [
45]. The QRNN model [
23] was designed to overcome the RNN limitations in terms of each timestep’s computation dependency on the previous timestep, which limits the power of parallelism. The QRNN combines the benefits of the CNN and RNN by using convolutional filters on the input data and allowing the long-term sequence dependency to store the data of previous timestamps [
23]. The computation structure of the QRNN is presented in
Figure 2. The QRNN consists of convolutional layers and recurrent pooling function, which allow the QRNN to work faster than LSTM due to its a 16-times-increase in speed while achieving the same accuracy as LSTM [
46]. The convolutional and pooling layers allow for the parallel computation of the batch and feature dimensions [
23]. The QRNN has been used in different applications such as video classification [
45], speech synthesis [
46], and natural language processing [
47].
Our hybrid DL model consists of a 1D convolutional layer, 1D max-pooling layer, a QRNN, and FC layers. The first 1D convolutional layer selects the spatial features and produces a feature map that will be processed by the activation function. The Rectified Linear Unit (ReLU) activation function is used in the convolutional layers because of its rapid convergence of gradient descent, which made it a good choice for our proposed model [
41]. Then, the feature map is processed by the second layer that uses the max-pooling operation. The max-pooling operation selects the maximum value in the pooling operation [
41]. The pooling layer reduces dimensionality and removes irrelevant features. The output of the CNN model retains the temporal feature that is extracted by the QRNN model.
Figure 3 provides details of our proposed model and shows that we used two QRNN layers to extract the temporal features. In the two layers of the QRNN, the hidden size represents the number of the hidden units and the output dimension. The hidden units can be selected based on the value of the number of features [
45]. One of the problems of a neural network is overfitting, which means that a model learns the data too well. Consequently, the model is not able to identify variants in new data [
22]. We added a dropout layer to prevent overfitting.
Then, a 1D convolutional layer and max-pooling layer are used to extract more spatial-temporal features. The output of the CNN model is passed to the Flatten layer, which is a fully connected input layer that transforms the output of the pooling layer into one vector to be an input for the next layer [
48]. Finally, the dense layer, which is also a fully connected layer, with the SoftMax activation function is used to classify the threats by calculating the probabilities for each class [
34].