1. Introduction
The exponential global surge in the use of conventional vehicles has undoubtedly enhanced individual convenience but has also escalated the risk of accidents [
1]. According to World Health Organization (WHO) statistics from 2023, fatalities from road accidents account for 29% of all reported injuries [
2]. To address these challenges, the deployment of a vehicular ad hoc network (VANET) [
3] has emerged as a promising solution. The primary objective is to reduce road accidents and optimize traffic flow through efficient vehicle-to-everything (V2X) communication [
4]. The onboard unit (OBU)-equipped vehicles possess robust communication, storage, and procession capabilities, effectively handling data transmission, storage, and computational tasks. In addition to safety applications (e.g., collision warning messages, emergency information dissemination, traffic conditions, speed limit warnings, and lane change assistance), vehicles can provide infotainment services [
5].
Despite numerous features, VANET faces various challenges due to the traditional transmission control protocol/internet protocol (TCP/IP). The existing TCP/IPs encounter issues such as intermittent connectivity, scalability, security, and privacy concerns in the context of VANETs. Specifically, a vehicular environment requires efficient crucial information dissemination and the ability to handle a large number of users within challenging intermittent conditions. In addition to its limitations, TCP/IP operates as a host-centric network, which adds an additional burden and worsens the overall network latency [
6]. Consequently, existing IP-based network architecture is inefficient for VANET. Alternatively, named data networking (NDN) [
7] has emerged as a promising network architecture of the information-centric network (ICN) [
8] for VANET. Instead of relying on IP addresses for establishing connections and data transmission, NDN employs a unique hierarchical naming approach (e.g., /VNDN/infotainment/music/album/video.mp4) that prioritizes content over its host. This content-centric model proves advantageous. One notable feature of NDN is its in-network content-caching mechanism, which enables vehicles to retrieve content from nearby nodes rather than solely relying on the original host.
Table 1 compares IP-based communication limitations and NDN-based solutions address those limitations.
NDN is one of the five research endeavors supported by the National Science Foundation (NSF) within its future internet architecture program, encompassing the fundamental principles of information-centric networking (ICN). In 2009, VAN Jacobson initially proposed the concept of a content-centric network [
9], which evolved into NDN under the NSF-funded future internet architecture project [
10] as a future internet architecture [
11]. NDN uses two types of packets: interest and data packets. The content consumer always initiates the interest packet to request specific data, and the data packet contains the content in response to the interest packet. As mentioned below, nodes within NDN are categorized into three categories according to the situation. (1) Content consumer node: It is a content-intensive entity that initiates communication by broadcasting an interest request for specific content. (2) Content producer node: The content producer node matches the requested content and provides it to the content consumer. (3) Intermediate node: The intermediate node in the NDN architecture serves two distinct roles based on the context of the received packet. Firstly, when the requested content name matches the available content name within the node’s storage, it functions as a content producer, directly providing the requested data to the content consumer node. Secondly, if the intermediate node does not have the requested content, it acts as a relay node, forwarding the interest packet to the next hop in the network. This dynamic behavior of intermediate nodes facilitates efficient content retrieval and distribution, contributing to the overall robustness of the NDN network.
In addition, every NDN node contains three data structures:
Content store (CS): The CS allows the NDN node to cache data packets and serve the content consumers without forwarding interest requests to the content producer every time. The CS reduces network congestion and improves content retrieval.
Pending interest table (PIT): The PIT stores unsatisfied interest requests and their interfaces in a table until the interest request is satisfied.
Forward information base (FIB): The FIB is responsible for forwarding unsatisfied interest packets to the next hops. Unlike traditional IP-based routing, NDN’s FIB entries are indexed with name prefixes rather than IP addresses, as described in [
12]. The entries in FIB contain next-hop information. This feature allows routers to direct interest packets to one or multiple next-hops, depending on the forwarding strategy, enabling efficient multi-path forwarding in the network.
Figure 1 shows the NDN communication architecture in vehicular NDN (VNDN).
Regardless of the fact that NDN has numerous features, it is highly vulnerable to various attacks, such as interest flooding attacks (IFAs) [
13], content poisoning attacks (CPAs) [
14], man-in-the-middle attacks [
15], and illusion attacks [
16]. On top of these attacks, IFA stands out as one of the most prevalent in VNDN. The IFA is a variant of a distributed denial of service (DDoS) attack, where a content consumer initiates IFA in VNDN with a storm of non-existing interest requests. The IFA deliberately depletes resources, including PIT, CS, network bandwidth, and producer resources. This attack is executed by inundating the network with excessive interest packets. By overwhelming the system, the attacker exhausts NDN resources, rendering them inaccessible to legitimate consumers and causing disruption in the network’s operation [
17]. IFA attackers can consume network resources by employing two distinct techniques: (1) a non-existing interest packet: In this approach, the attacker generates random interest packets that contain invalid requests, such as /VNDN/infotainment/music/5453.txt, where the attacker requests a text file in the music prefix. These packets refer to content that does not exist in the network. Consequently, intermediate nodes cannot resolve and retain such requests in the PIT. This results in unnecessary resource consumption and potential network congestion. Thus, PIT can be choked with forged interest packets. (2) Valid data request: The attackers target content producers with enormous legitimate interest packets using forged nounce [
13]. For example, an attacker initiates an interest packet with /VNDN/infotainment/music/nounce, where nounce is a random value. Using such forged interest packets significantly impacts the producers and network routers by traversing the network resources.
Figure 2 visually represents IFA in VNDN, where the attack scenario poses a significant threat to the communication infrastructure of connected intermediate vehicles using a non-existing and valid interest packet.
To address the challenge of IFA in NDN, researchers have explored various approaches, including threshold-based IFA detection [
18], statistical-based countermeasures [
19], reputation-based IFA detection [
20], rating-based approaches [
21], and charging/rewarding mechanisms [
22]. Although these approaches have contributed significantly to detecting IFA in NDN, they have not provided an efficient solution for accurately detecting and preventing such attacks in the VNDN. On the other hand, ML is gaining momentum in anomaly detection [
23] in various fields, including the Internet of Things (IoT) [
24], healthcare [
25], image processing [
26], spam detection [
27], unmanned aerial vehicles (UAVs) [
28], VANET [
29,
30], NDN [
31] and so on. Specifically, ML has yielded substantial advancements by significantly enhancing capabilities in diverse aspects, including intrusion detection [
32], optimal resource allocation [
33], offloading strategies, and precise mobility pattern forecasting. Despite its widespread application in various domains, none of the previously mentioned research has explored the use of ML in VNDN. Unlike traditional approaches for detecting IFA in NDN, we are the first to propose an ML-based efficient solution to detect and prevent IFA in VNDN. Considering the challenges and limitations highlighted in the existing literature, the main focus of this research is to propose a resilient network framework that effectively tackles IFA through ML classifiers. To achieve this, we evaluate and propose the most accurate ML classifier for CPA detection. The significant contributions of this research are as follows:
We propose an ML-based classification technique to identify attackers and legitimate vehicles.
We evaluate the accuracy of five ML classifiers and propose the most accurate algorithm for IFA detection.
Based on our ML-based detection results, we propose a simulation-based IFA prevention system in intermediate nodes.
By focusing on the detection and prevention of IFA in VNDN, this research aims to fortify the resilience of vehicular communication systems. Our ML-based approach mitigates the immediate threats posed by IFAs and establishes a foundation for secure VNDN ecosystems. The subsequent sections of this paper are structured as follows:
Section 2 presents a detailed existing work and their limitations in detecting and preventing IFA.
Section 3 delves into a comprehensive analysis of the system model, network elements, and proposed ML-based IFA detection and prevention system. We provide IFA detection and prevention results in
Section 4 and conclude the paper in
Section 5. Finally,
Section 6 presents future work.
2. Related Work
The scientific community has seen a marked surge in interest and contributions toward combatting cyber-attacks [
34], specifically in the VNDN realm [
35]. However, develo** and implementing efficient security measures for safeguarding VNDN is in its infancy. Specifically, ML techniques have been explored in detecting attacks [
36,
37]. Reference [
38] discovered IFA in NDN by expressing how a huge number of interest packets can overwhelm the network. Subsequently, numerous research papers have delved into the IFA using several techniques; for example, the authors in [
31] proposed an ML-based classification technique for detecting IFA on tree topology (small-scale topology) and Rocketfuel ISP topology (large-scale topology). Another solution for detecting IFA presented a centralized controller-based approach [
39]. In this mechanism, a router maintains an unsatisfied interest request threshold system. Based on a predetermined threshold value, a router decides to identify IFA nods. However, this approach has certain limitations. The metrics used to identify IFA nodes may lead to false detection, and the system might fail to detect IFA, especially in scenarios with significant legitimate traffic or when content producers are unavailable. These shortcomings highlight the need for more advanced and robust detection mechanisms to combat IFA in VNDN effectively. Similarly, in reference [
40], the authors presented a threshold-based system for identifying IFA within a local PIT. Instead of relying on a centralized router, the PIT manages a predetermined threshold system in this approach. The threshold-based approach allows the local PIT to assess incoming interest requests autonomously and identify potential IFA scenarios based on the predefined threshold criteria. This solution aims to enhance the efficiency and accuracy of IFA detection within the network by decentralizing the detection process and employing local PIT mechanisms. However, it is essential to consider the trade-offs and limitations of this approach, particularly in terms of scalability and adaptability to various network conditions.
In order to prevent PIT from exhaustion, Wang et al. [
41] proposed decoupled legitimate and timeout interest requests. Each router maintains a timeout interest request in this architecture in an m-list. If the prefix is already in the m-list, the router forwards the interest packet outright, avoiding PIT storage. While this approach mitigates some of the impacts of an IFA, it fails to provide a comprehensive solution to thwart such attacks entirely. Additionally, legitimate requests are adversely affected by this approach. In addition, attackers can misuse the router’s resources by forging names to flood the m-list, causing the solution to become inefficient. In contrast, few authors have proposed a hypothesis-testing theory-based approach in the literature; for example, the authors in [
42] exploited hypothesis-based testing theory for formulating a comprehensive likelihood static hypothesis test theory (SHTT) tailored to address evolving attacks, particularly in incorporating NDN with TCP/IP, which is difficult to address using conventional approaches. Similarly, the authors in [
43] tackled IFA by employing a detection approach based on SHTT. The test is free from any reliance on router characteristics or measured values. The framework comprises two main scenarios: (i)When all traffic parameters are known, an optimal test is formulated, and its statistical performance is thoroughly evaluated. (ii) The framework introduces a linear parametric model, which estimates unknown parameters and enables the development of a practical test. However, it is crucial to recognize that the scheme assessment is restricted to a basic binary tree graph with merely eight clients and one adversary. Consequently, assessing the scheme’s efficacy under more extensive networks or during distributed attacks presents considerable challenges.
Meanwhile, the authors in [
44] presented a Markov-based IFA detection system. This approach involves creating a space vector determined by the fluctuations in the PIT occupancy rate, and the network’s state is evaluated using a quantized value. By calculating the Euclidean distance, the system distinguishes between legitimate and malicious interest packets, achieving a high detection rate. However, a notable drawback of this approach is its significant consumption of network resources, particularly when identifying interest packets within a large volume of NDN network traffic.
Moreover, ** and aggregation”, where the majority vote of the base models on the test data determines the final result. In the RF approach, the data are fed to the base models using row sampling with replacement, a method known as bagging.
GNB: GNB [
55] is a simple yet effective classification method that employs Bayes’ theorem for predicting the class of unlabeled data points. We selected GNB for its efficiency in high-dimensional data handling and probabilistic nature, allowing it to capture the likelihood of feature co-occurrences relevant to IFA scenarios. It calculates the prior probabilities of different classes and utilizes this information to make predictions on new, unseen data. One of the key assumptions of GNB is the independence of features, which means that it assumes each feature contributes to the classification independently of other features. This independence assumption simplifies the computation and makes GNB computationally efficient. Due to its simplicity and efficiency, GNB is particularly well-suited for applications with many features and is commonly used in various ML tasks.
LR: The LR [
56] is a statistical method used for predicting the probability of categorical variables, especially in two-class classification problems. It is a well-established binary classification technique for IFA detection. It utilizes a logistic function to calculate an event’s likelihood.