A Water Shortage Risk Assessment Model Based on Kernel Density Estimation and Copulas

Qian, Tanghui; Shi, Zhengtao; Gu, Shixiang; **, Wenfei; Chen, **g; Chen, **ming; Bai, Shihan; Wu, Lei

doi:10.3390/w16111465

Open AccessArticle

A Water Shortage Risk Assessment Model Based on Kernel Density Estimation and Copulas

by

Tanghui Qian

¹

,

Zhengtao Shi

^1,*,

Shixiang Gu

^2,*

,

Wenfei **

¹,

**g Chen

²,

**ming Chen

²,

Shihan Bai

² and

Lei Wu

²

¹

Faculty of Geography, Yunnan Normal University, Kunming 650500, China

²

Yunnan Provincial Institute of Water Resources and Hydroelectric Survey & Design & Research, Kunming 650500, China

^*

Authors to whom correspondence should be addressed.

Water 2024, 16(11), 1465; https://doi.org/10.3390/w16111465

Submission received: 21 April 2024 / Revised: 13 May 2024 / Accepted: 17 May 2024 / Published: 21 May 2024

(This article belongs to the Special Issue Water Scarcity: From Ancient to Modern Times and the Future, Volume II)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Accurate assessment and prediction of water shortage risk are essential prerequisites for the rational allocation and risk management of water resources. However, previous water shortage risk assessment models based on copulas have strict requirements for data distribution, making them unsuitable for extreme conditions such as insufficient data volume and indeterminate distribution shapes. These limitations restrict the applicability of the models and result in lower evaluation accuracy. To address these issues, this paper proposes a water shortage risk assessment model based on kernel density estimation (KDE) and copula functions. This approach not only enhances the robustness and stability of the model but also improves its prediction accuracy. The methodology involves initially utilizing kernel density estimation to quantify the random uncertainties in water supply and demand based on historical statistical data, thereby calculating their respective marginal probability distributions. Subsequently, copula functions are employed to quantify the coupled interdependence between water supply and demand based on these marginal probability distributions, thereby computing the joint probability distribution. Ultimately, the water shortage risk is evaluated based on potential loss rates and occurrence probabilities. This proposed model is applied to assess the water shortage risk of the Yuxi water receiving area in the Central Yunnan Water Diversion Project, and compared with existing models through experimental contrasts. The experimental results demonstrate that the model exhibits evident advantages in terms of robustness, stability, and evaluation accuracy, with a rejection rate of 0 for the null hypothesis of edge probability fitting and a smaller deviation in joint probability fitting compared to the most outstanding model in the field. These findings indicate that the model presented in this paper is capable of adapting to non-ideal scenarios and extreme climatic conditions for water shortage risk assessment, providing reliable prediction outcomes even under extreme circumstances. Therefore, it can serve as a valuable reference and source of inspiration for related engineering applications and technical research.

Keywords:

kernel density estimation; copula; water shortage risk assessment model; robustness; accuracy; Central Yunnan Water Diversion Project receiving area

1. Introduction

Global warming and intensified human activities have exacerbated the spatiotemporal variations in global precipitation, evapotranspiration, runoff, and their associated water cycle [1,2], increasing uncertainty and risk in water supply and posing greater challenges to water resource management [3]. The “Global Drought Snapshot 2023: The Need for Proactive Action” [4] points out that if the global average temperature exceeds pre-industrial levels by 3 °C, an estimated 170 million people will experience extreme drought; limiting the temperature rise to 1.5 °C would result in an expected 50 million people experiencing extreme drought. Furthermore, 15–20% of China’s population could face more frequent moderate-to-severe droughts within this century; and by 2100, the intensity of drought in China is expected to increase by 80%. Therefore, the water crisis is a global issue [5], with China being particularly affected. Additionally, over the past century, the world’s freshwater usage has increased sixfold [6], and it is forecast that the global water demand will continue to rise, with approximately 25% of major cities facing heightened water stress [7]. Therefore, under the conditions of ongoing climate change and increasing water resource demand, accurate assessment and prediction of water scarcity risks are crucial for regional water resource planning, allocation, and risk management [8,9]. Since the 1980s, scholars have conducted extensive research on water scarcity and risk assessment [10,11,12,13,14], making significant contributions to mitigating global and regional water scarcity risks. However, there has been little research on the robustness of models and the accuracy of assessments.

Assessing and predicting water shortage risks is a complex process, owing to the uncertainty and extremity of hydrological variables in both space and time. Hashimoto, et al. [15] were the first to propose quantitative assessment indicators for water scarcity risks, including reliability, vulnerability, and resilience, from the perspective of the probability of water resource system failures. Subsequently, indicators such as the Falkenmark Water Stress Index [16], per capita water scarcity standards [17], Social Water Stress Index (SWSI) [18], Water Poverty Index (WPI) [19,20,21], and Water Scarcity Risk Index (WSRI) [22] have been proposed for assessing the degree and risk of water scarcity at both global and regional scales [23,24,25,26]. The aforementioned methods extract key elements from events and factors causing water scarcity to construct an evaluation indicator system for quantifying water shortage risk. Commonly used indicator weighting methods include the analytic hierarchy process (AHP) [27], entropy weight method [28], maximum entropy principle [29], principal component analysis (PCA) [30,31], projection pursuit [32], G1 method, and entropy weight–G1 method [33]. Approaches for risk quantification based on such indicators and weights include fuzzy comprehensive evaluation [27,30,34], fuzzy cluster analysis [35], variable fuzzy set evaluation model [36], grey relational and information diffusion theory [37,38,39], normal cloud model [40], matter element model [33,41], dynamic modeling of water resource shadow price calculation [42], and multi-objective risk decision models [43]. These types of methods have simple principles and are easy to use, but have strong subjectivity in indicator selection, weight determination, and risk level map**. The evaluation results are relatively rudimentary, and the evaluation accuracy is not high, so they are only suitable for large-scale applications.

Another type of data-driven method [44,45,46] involves capturing data patterns through mathematical statistics, simulating the probability distribution of hydrological variables based on historical statistical data, thereby quantifying the risks associated with the random uncertainty of hydrological variables. The commonly used univariate probability distribution simulation methods include Pearson-III [47], normal [48], logistic [49], log-logistic [50], generalized extreme value (GEV) [51], gamma and Weibull [52,53], etc. These probability distribution functions are commonly used to fit the marginal probability distributions of hydro-meteorological time series such as precipitation [47], runoff [54,55], water supply and demand [50], drought duration and severity [51,53], potential evapotranspiration, and average temperature [52]. In practical situations, water shortage risk is a multi-variable stochastic coupled risk, requiring resolution of joint probabilities across multiple variables. Copula function families are widely used for computing joint probability distributions of multiple variables [50,53,56,57,58], due to their excellent ability to characterize the degree and structure of multivariate correlations [59]. It is important to emphasize that this requires building upon previous univariate marginal distributions. Ultimately, water scarcity risk is defined as a function of the joint probability of occurrence of water shortage events and their potential losses [48,60,61].

Previous studies have commonly selected a function from known distribution families that fits the univariate probability distribution as the marginal distribution, presupposing knowledge of the specific distribution to which the data adhere, verified through significance tests. However, the distribution forms of hydrological, meteorological, and socioeconomic variables are complex and diverse, potentially approximating normal or skewed distributions, complicating selection of distribution functions. This challenge may result in suboptimal function choices or even an inability to identify a suitable distribution function, severely restricting model applicability. To address this, this paper introduces kernel density estimation (KDE) [62,63,64] as a fitting method for univariate marginal probability distributions. Leveraging its non-parametric nature and capability to capture local detail features, KDE enhances model robustness and fitting accuracy, enabling adaptation to complex real-world scenarios. Multivariate joint probability distribution modeling utilizes copula functions. To accommodate different correlation structures among multivariate combinations, the copula function is not fixed. By employing Spearman and Kendall rank correlation coefficients [65,66], the optimal copula function is dynamically selected for each set of multivariate data as the final linking function, enhancing the predictive accuracy of the model. Finally, water shortage risk is quantified from the perspectives of joint probability and loss rate.

The Central Yunnan Water Diversion Project is a major water diversion project aimed at addressing severe water shortage issues in the Central Yunnan region, with significant economic, social, and ecological benefits [67]. Studying the characteristics and variations of water shortage risks in the water-receiving area based on historical supply and demand data is of great value, as it can provide a basis for the precise allocation of water resources after project completion. Applying the method proposed in this paper to evaluate the historical water shortage risk in the Yuxi water-receiving area of the Central Yunnan Water Diversion Project and conducting comparative experiments with existing methods will validate the superiority of the proposed method in terms of robustness, resilience, and predictive accuracy.

2. Principles and Methods

The water shortage risk refers to the probability and severity of threat posed to the normal operation of social, economic, and ecological systems in a specific spatiotemporal context due to the disruption of the water supply–demand balance caused by the randomness and uncertainty of water supply and demand [27]. Water scarcity is primarily evaluated and analyzed from the perspective of the water supply–demand balance to assess the degree and duration of water shortage and its potential impacts on life, production, and ecology. This evaluation considers how imbalances in water availability can disrupt normal societal functions, economic activities, and environmental health. This paper draws on Qian, Zhang, Wang and Hong [48] for the definition of water shortage risk, defining it as the product of water shortage probability level and potential loss rate level. Firstly, KDE is utilized to simulate the marginal probability distributions of long-term time series of both water supply and demand, quantifying the stochastic uncertainty of supply and demand. Secondly, based on the marginal probability distributions of both supply and demand, copula functions are employed to simulate the joint probability distribution, quantifying the probability of water shortage occurrence. Subsequently, the potential loss rate due to water shortage for each water user is calculated to quantify the severity of water shortage. Finally, both the probability of water shortage and the potential loss rate are divided into five warning levels, and criteria for level division are determined. The calculated water shortage probability and potential loss rate from the previous steps are mapped to the warning levels and multiplied to obtain the water shortage risk warning level. The constructed model not only characterizes the interdependence of supply and demand but also quantifies the coupling risk under extreme events. The algorithm flowchart of the model is shown in Figure 1.

2.1. Simulating the Marginal Probability

The monthly probability distributions of water supply and demand vary widely and are difficult to determine. This paper introduces kernel density estimation (KDE) to simulate the marginal probability distributions of water supply and demand sequences. KDE is a non-parametric probability density estimation method that does not require assuming data follow a specific distribution. Instead, it fits the probability distribution of discrete data based on the characteristics and properties of the data themselves. It can flexibly adapt to various complex and unknown data distribution forms and is commonly used in statistics for inferring the distribution of population data based on finite samples. Assuming that the discrete data points

\{x_{1}, x_{2}, \dots, x_{n}\}

are from an unknown distribution, the formula for the probability density

f (x)

at any point

x

is

f (x) = \frac{1}{n h} \sum_{i = 1}^{n} K (\frac{x - x_{i}}{h})

(1)

where

n

is the sample size,

h

is the bandwidth parameter determining the smoothness of the estimated curve, and

K (\cdot)

is the kernel function. Based on the smoothness and good mathematical properties of the Gaussian kernel function, this paper selects the Gaussian function as the kernel function for kernel density estimation. Thus, the Gaussian kernel density estimation function is

f (x) = \frac{1}{n h \sqrt{2 π}} \sum_{i = 1}^{n} e^{- \frac{{(x - x_{i})}^{2}}{2 h^{2}}}

(2)

Therefore, the formula for calculating the probability distribution at any point

x

is

F (x) = \int f (x) d x = \int \frac{1}{n h \sqrt{2 π}} \sum_{i = 1}^{n} e^{- \frac{{(x - x_{i})}^{2}}{2 h^{2}}} d x

(3)

The size of the bandwidth

h

directly determines the performance of the fitting result. The value of

h

cannot be too large or too small. If it is too large, it does not satisfy the condition

h

→ 0, and if it is too small, it means that too few points are involved in the fitting, leading to large errors. For determining the value of

h

, this paper borrows an idea from machine learning, constructing a risk function by minimizing the error through optimization iterations. When the error converges to a certain set value, the optimal bandwidth

h

is obtained. Here, the risk function is constructed by minimizing the mean integrated squared error (MISE):

M I S E (h) = m i n_{h} \int {(\hat{f} (x) - f (x))}^{2} d x

(4)

where

f (x)

is the true probability density function, and

\hat{f} (x)

is the probability density function obtained by using a given value of h. We need to find an h that minimizes

M I S E (h)

and this yields the optimal bandwidth value. Based on Silverman’s rule of thumb [68], the formula to calculate the optimal bandwidth

h

is given by

h_{b e s t} = 1.06 \times m i n (\hat{σ}, \frac{I Q R}{1.34}) \times n^{- \frac{1}{5}}

(5)

where n is the sample size;

\hat{σ}

is the sample standard deviation; IQR (interquartile range) is the interquartile range, which is the difference between the upper quartile and the lower quartile, serving as another measure of data dispersion. When the distribution contains outliers, using IQR is more robust.

2.2. Simulating the Joint Probability

The water shortage risk arises from the randomness and uncertainty of water supply and demand. Copula functions can connect the joint distribution of multiple variables with their respective marginal distributions, capturing the stochastic dependence between variables. They are widely used in multivariate hydrological risk analysis and calculation [55,58]. In this study, copula function families are utilized to simulate the joint probability distribution of water supply and demand, aiming to quantify the coupling probability of water shortage. Due to the different properties of various copula functions and their ability to describe different correlation structures, as well as the varying correlation structures and probability distributions of water supply–demand sequences in different regions and months, selecting appropriate copula functions for different water supply–demand sequences can improve the fitting accuracy. Specifically, this study chooses a total of five copula functions from two families: the Archimedean function family (Frank copula, Clayton copula, Gumbel copula) [69] and the elliptical function family (Gaussian copula, t copula) [52] to simulate the joint probability distribution of water supply and demand. The best-fitting copula function is then selected as the final linking function. Copulas construct multidimensional joint distributions based on marginal distributions and correlation structures, with their fundamental theory based on Sklar’s theorem.

Sklar’s theorem [70]: Let

x

and

y

be continuous random variables with marginal distribution functions denoted by

u = F_{x} (x)

and

v = F_{y} (y)

respectively. Let

F (x, y)

be the joint distribution function of variables

x

and

y

. If

F_{x}

and

F_{y}

are continuous, then there exists a unique function

C (u, v)

such that

F (x, y) = C (F_{x} (x), F_{y} (y)), \forall x, y

(6)

C (u, v)

is the copula connection function, where

u

and

v

are the cumulative distribution function values of variables

x

and

y

, respectively. Different copula connection functions have different mathematical forms and properties. Table 1 below provides a brief introduction to the properties of the five bivariate copula connection functions and their parameters.

2.3. Estimation of Potential Loss Rate

The potential loss rate is the ratio of the potential loss incurred due to water shortage to the normal production value under conditions of full water supply for the water user. This paper categorizes water users into three groups: production, domestic, and ecological.

The production users mainly include industrial and agricultural. The potential loss rate for production, denoted as

L_{P}

is defined as the ratio of the reduction due to water shortage in agriculture and industry to the normal production value; the calculation formula is as follows:

L_{P} (x, y, p) = \frac{\sum_{i = 1}^{N} (x_{i} - y_{i}) \times p_{i}}{\sum_{i = 1}^{N} x_{i} \times p_{i}}, i = 1, 2

(7)

where

x_{i}

and

y_{i}

represent the water demand and supply for industrial or agricultural production, respectively, and

p_{i}

denotes the economic value that ten thousand cubic meters of water supply can bring to industrial or agricultural production.

The potential loss rate of domestic (denoted as

L_{L}

) and the potential loss rate of ecological (denoted as

L_{E}

) cannot be quantified in the same manner as production. In this study, they are represented using the water shortage rate, defined as

L_{L} (x, y) = \frac{x_{L} - y_{L}}{x_{L}}

(8)

L_{E} (x, y) = \frac{x_{E} - y_{E}}{x_{E}}

(9)

where

x_{L}

and

y_{L}

represent the demand and supply of water for domestic, and

x_{E}

and

y_{E}

represent the demand and supply of water for ecological.

Given the differing levels of importance between production, domestic, and ecological water uses, an empirical weight is assigned to each type’s potential loss rate to distinguish their significance. Considering that domestic water use is the most critical, followed by production water use, and then, ecological water use, we assign weights of 0.5, 0.3, and 0.2, respectively, in that order. Consequently, the overall water shortage potential loss rate for a given assessment unit is finally defined as

L (x, y, p) = 0.5 \times L_{L} + 0.3 \times L_{P} + 0.2 \times L_{E}

(10)

2.4. Estimation of Water Shortage Risk

In this study, the water shortage risk warning level R is defined as the product of the water shortage probability warning level and the potential loss rate level:

R = M (F (x, y)) \cdot M (L (x, y, p))

(11)

Here,

M

represents the level map** function, which is utilized to map water shortage probabilities to their corresponding warning levels according to probability threshold intervals outlined in Table 2. Similarly, it maps potential loss rates to their respective warning levels based on loss rate threshold intervals provided in Table 3. The final level map** matrix for water shortage risk and the corresponding warning level and safety level classification standards can be found in Table 4. The equation indicates that the overall risk assessment takes into account both the likelihood of a water shortage event occurring and the severity of potential losses associated with such an event. By multiplying these two factors, a more comprehensive understanding of the risk posed by water shortage can be achieved.

3. Instance Application and Comparative Experiment

3.1. Overview of the Study Area

The Yuxi receiving water area of the Central Yunnan Water Diversion Project selected as the case study region (refer to Figure 2) is located in the central part of Yunnan Province, China. For convenient management and allocation of water resources, the research area is divided into 13 receiving water sub-regions and three ecological lakes, as depicted in Figure 2a. The hydrological and topographical features of the study area are observed in Figure 2b. Major river systems in the area include the Qujiang, Nanpanjiang, Dajie, and Hongqi rivers. The region encompasses five meteorological stations, Hongta, Jiangchuan, Tonghai, Huaning, and Eshan (Figure 2b), which serve as sources of precipitation data. The average annual precipitation in the study area ranges from 800 to 900 mm. From May to October, influenced by the southeast monsoon from the Bay of Bengal and the southwest monsoon from the Indian Ocean, the region experiences abundant rainfall, with approximately 83% to 87% of the annual precipitation occurring during this period. The months from June to September receive the highest rainfall, accounting for 65% to 69% of the annual total. From November to April of the following year, the region is influenced by dry and clear weather conditions due to the dry and warm air currents from the northern Indian continent, resulting in minimal precipitation, accounting for only 13% to 17% of the annual total. The driest months, such as December and February, receive only about 2% of the annual precipitation. The average annual temperature ranges from 15 to 16 °C, with little variation in other climatic features such as average evaporation and relative humidity. Runoff is primarily derived from precipitation, with distinct seasonal variations. Approximately 85% of the runoff occurs during the flood season from June to November, with the highest runoff volumes observed in July, August, and September, accounting for 58% of the annual total. In contrast, during the dry season from December to May of the following year, runoff decreases significantly, accounting for only about 15% of the annual total, with the lowest runoff volumes observed in March and April, representing only 2.9% of the annual total.

3.2. Data Sources

The precipitation data are sourced from the monthly surface meteorological observation historical dataset provided by the National Meteorological Science Data Center, available at: https://data.cma.cn (accessed on 20 January 2024), monthly precipitation data from five meteorological stations, including Hongta, Jiangchuan, Tonghai, Huaning, and Eshan, spanning from 1961 to 2011. Historical supply and demand data from 1960 to 2011 for 13 sub-receiving water areas, as well as population and economic data, were provided by the Yunnan Provincial Institute of Water Resources and Hydroelectric Survey, Design, and Research. The supply and demand data are monthly and include categories such as domestic, industrial, agricultural, and ecological; annual supply and demand series in the water-receiving area are shown in Figure 3. It can be observed that agricultural water demand is the highest, with significant annual fluctuations, followed by industrial, domestic, and ecological water demand, respectively. Other hydrological and water resource data are sourced from the “Yuxi City Water Resources Bulletin”; and the socioeconomic data are sourced from the “Statistical Bulletin on National Economic and Social Development of Yuxi City”.

3.3. Building the Water Shortage Risk Assessment Model

We use the annual supply–demand water series of **ushan as sample data to demonstrate the construction process of the water shortage risk assessment model, implemented through Matlab programming. The key to model construction lies in determining the optimal bandwidth for Gaussian kernel density estimation and solving the parameters of the bivariate copula model.

①: To compute the optimal bandwidth

The optimal bandwidths for Gaussian kernel density estimation for the demand and supply water samples are calculated as

h_{d e m a n d} = 444.4479

and

h_{s u p p l y} = 457.8039

, respectively. To validate the fitting effect of the optimal bandwidth, we scale down (0.2 and 0.5 times) and scale up (1.5 and 2 times) the optimal bandwidths and use them as parameters in Equation (2) to simulate the probability density of demand and supply water sequences. The results are shown in Figure 4. From Figure 4, it can be observed that the fitting effect of the optimal bandwidth is the most ideal. The fitting result with the reduced bandwidth is not smooth enough and tends to overfit, while the fitting result with the enlarged bandwidth is too smooth, leading to larger errors. Then, by using the optimal bandwidths in Equation (3), we can construct the simulation functions for the marginal probability distributions of supply and demand water sequences.

②: Estimation of Bivariate Copula Parameters

Based on the marginal probability distributions of water supply and demand, the model parameters of the Gaussian copula, t copula, Gumbel copula, Clayton copula, and Frank copula are estimated using Matlab’s copulafit function. The parameter estimation results are shown in Table 5, and the fitted bivariate copula density functions and distribution functions are illustrated in Figure 5. Thus, by incorporating the model parameters into Equation (6), the bivariate copula joint probability fitting model can be obtained.

3.4. Model Performance Comparative Experiment

To verify the superiority of the proposed model in robustness and assessment accuracy, comparative experiments are designed from two aspects: simulating the robustness and fitting accuracy of univariate marginal distributions, and simulating the stability and fitting accuracy of bivariate joint probabilities. Comparative experiments are conducted using Matlab programming. Finally, the experimental results are compared and analyzed to provide evidence and insights.

3.4.1. Comparison Experiments on the Accuracy and Robustness of Marginal Distribution Simulations

To validate the superiority of KDE in simulating the univariate probability distribution of monthly supply–demand sequences, four commonly used distribution functions in the hydrological field (gamma [52], normal [48], logistic [49], Pearson3 [57]) were selected for comparison experiments with the introduced KDE in this paper. The Kolmogorov–Smirnov test (KS test) and root mean square error (RMSE) were utilized to evaluate the robustness and assessment accuracy of the comparison methods from both confidence probability and fitting accuracy perspectives.

The Kolmogorov–Smirnov test [71,72] is a non-parametric goodness-of-fit test. It determines whether to reject the null hypothesis by comparing the maximum absolute deviation (D-statistic) between the empirical cumulative distribution function (ECDF) of the sample and the theoretical cumulative distribution function (CDF) to the critical value at a specified significance level (0.05). Suppose the sample set

X = \{x_{1}, x_{2}, x_{3}, \dots, x_{n}\}

has an empirical cumulative distribution function

F_{n} (x)

, and X follows a theoretical distribution

F (x)

. The formula to compute the maximum absolute deviation

D_{n}

between the two is as follows:

D_{n} = s u p_{x} |F_{n} (x) - F (x)|

(12)

where

s u p_{x}

represents the supremum, which is the lowest upper bound of the distances, i.e., the maximum value among all possible absolute differences. If

X

follows the theoretical distribution

F (x)

, then as

n

tends to infinity,

D_{n}

almost surely converges to 0.

The root mean square error (RMSE) [73] is commonly used to assess the performance of prediction models or the goodness-of-fit of data. In this paper, RMSE is utilized to quantify the difference between the empirical distribution and the fitted theoretical distribution as a measure of goodness-of-fit evaluation [74]. RMSE can be expressed as follows:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {[F_{n} (x_{i}) - F (x_{i})]}^{2}}

(13)

In the above equation,

F_{n} (x_{i})

represents the empirical cumulative probability distribution of the observed data,

F (x_{i})

represents the fitted theoretical probability distribution, and

n

is the number of samples in the dataset.

We simulated the probability distributions of monthly and annual water supply and demand sequences for 13 assessment units (sub-regions) in the study area using gamma, normal, logistic, Pearson3, and KDE, resulting in a total of 169 sequences for both supply and demand. Then, based on the fitted models and empirical cumulative probabilities, we used Equations (12) and (13) to calculate the KS p-value and RMSE.

①: Comparative analysis of the Kolmogorov–Smirnov test results

The results of the KS p-value calculations for the demand and supply sequences are plotted as curves in Figure 6. The p-value represents the probability of observing the current sample data or more extreme data under the assumption that the null hypothesis is true. A higher p-value indicates a higher probability of observing the data, indicating a better fit. From Figure 6a, it can be observed that out of the 169 demand sequences, only the 164th sequence has a KS test result of 0.038 (normal distribution fit), which is below the significance confidence level of 0.05. The results of the KS test for the other sequences are above the confidence level, indicating that extreme situations in water demand are relatively rare, and most tend towards a normal distribution. From the KS p-value calculation results for the supply sequences (Figure 6b), it can be seen that, except for KDE, the KS tests for the other four distributions show instances where the p-value is below the 0.05 confidence level. This indicates that the distribution of water supply is complex and diverse, and some sequences do not belong to any of the four known distributions; only KDE can correctly fit their probability distribution. Therefore, this indicates that KDE can perform well when the fitting results of other methods are poor. It can adapt to all extreme distribution shapes, demonstrating robustness and resilience unmatched by other methods.

The statistical results of the KS p-values, including the mean, variance, and rejection rate of the null hypothesis, are summarized in Table 6. The optimal values are highlighted in bold in the table. The mean is used to measure the overall fitting accuracy of the method. A higher mean of the KS p-value indicates higher overall fitting accuracy. From the mean statistics in Table 6, it can be observed that KDE has the highest overall fitting accuracy for both the demand and supply sequences. The variance is used to evaluate the volatility of the KS p-value results, which reflects the stability of the method. A smaller variance indicates less volatility and greater stability of the method. From the variance statistics in Table 6, it can be seen that KDE exhibits the most stable fitting of the distribution shapes for both the demand and supply sequences. The rejection rate of the null hypothesis indicates the proportion of p-values lower than the significance level of 0.05, serving to characterize the fitting capability of the method when facing various distribution shapes. As shown in Table 6, the rejection rate of the null hypothesis for KDE is 0, indicating that KDE can adapt to the distribution shapes of all water supply and demand sequences, demonstrating the strongest modeling capability.

②: Comparative analysis of the RMSE evaluation results

The RMSE calculation results for the empirical and theoretical probability distributions of the water demand and supply sequences are shown in Figure 7. RMSE is a measure of the deviation between empirical and theoretical distributions, where smaller RMSE values indicate better fitting accuracy. The mean, variance, and range of RMSE values are then calculated in Table 7, with the optimal values highlighted in bold. The mean RMSE represents the average deviation of the fitting, with smaller values indicating higher overall fitting accuracy. The variance and range of RMSE values assess the stability and robustness of the fitting, where smaller values indicate lower fluctuation and a narrower range. From Figure 7a and Table 7, it can be observed that all five methods exhibit relatively high fitting accuracy for the water demand sequences, with fitting deviations below 0.1. Pearson3 and KDE have the smallest mean deviations, but KDE also demonstrates the lowest variance and range, indicating that among the five methods, KDE not only offers high fitting accuracy but also superior stability and robustness. Regarding the RMSE evaluation results for the water supply sequences in Figure 7b and Table 7, it is evident that the fitting deviations vary significantly among the five methods. Except for KDE and logistic, the other three methods show considerable fluctuation, especially gamma, which has the largest mean deviation, variance, and range. This suggests that the gamma distribution cannot adequately accommodate all distribution patterns of the water supply sequences. In contrast, KDE exhibits the smallest mean deviation, variance, and range, indicating that KDE not only offers the highest fitting accuracy but also the best stability and robustness, capable of accommodating all extreme distribution shapes of the water supply sequences.

③: Comparison and analysis of the fitting results

To visually compare the fitting effects of the five methods more intuitively, empirical probability distribution graphs and fitted theoretical distribution graphs were further plotted. Five representative sets (Figure 8) were selected from all the results for comparative analysis. From the fitted curves, it can be observed that for sequence data tending towards a normal distribution, such as the water demand and supply sequences of **ushan, all methods achieve relatively good fitting effects. For sequences with significant fluctuations in demand and supply, such as Luohe and Yanhe, except for poor fitting by the gamma distribution, the other four methods can still adapt. However, for supply sequences with special distribution shapes like Shifu and Lishan-2-month, only KDE can accurately fit their probability distributions.

Overall, it is evident that KDE demonstrates the optimal fitting capability, showing robustness and resilience for random variables of water supply and demand with diverse and unknown distribution shapes. It can adapt to extreme distribution shapes and is highly suitable as a marginal distribution simulation function for water supply and demand sequences.

3.4.2. Joint Probability Simulation Accuracy and Robustness Comparative Experiment

The accuracy of marginal probability fitting for water supply and demand univariate variables, along with the suitability of copula functions, jointly determine the accuracy of the joint probability simulation. The higher the accuracy of the joint probability simulation, the more accurate the assessment of the water shortage risk. To further verify the superiority of KDE in joint probability simulation, the marginal probability distributions of water supply and demand obtained by the previous five methods are, respectively, input into the five copula functions in Equation (6) to calculate their respective joint probability distributions. That is, the joint probabilities of water supply and demand are calculated for each combination of the five univariate probability simulation methods and the five copula functions. Then, the mean of the Spearman correlation coefficient and Kendall rank correlation coefficient for the bivariate copula under the given parameters, which are shown in Table 5, is used to select the optimal copula for each method as the final simulated linkage function. Finally, the accuracy of the model is evaluated by calculating the squared Euclidean distance (SED) between the empirical joint probability distribution and the theoretical joint probability distribution.

The Spearman and Kendall rank correlation coefficients are non-parametric methods for measuring the strength and direction of the relationship between two variables based on the ranks of data objects [65]. They are particularly suitable for situations where the data do not follow a bivariate normal distribution or the measurement scale is not continuous and quantitative, and they are not influenced by outliers. The difference lies in that the Kendall rank correlation coefficient (

τ

) is based on the concordant and discordant pairs of two sample datasets, as shown in Equation (14), while the Spearman rank correlation coefficient (

ρ

) is based on rank differences, as shown in Equation (15).

τ_{a} = \frac{c - d}{\frac{1}{2} n (n - 1)}, τ_{b} = \frac{c - d}{\sqrt{(c + d + t_{x}) (c + d + t_{y})}}

(14)

In the equations,

n

represents the number of samples,

\frac{1}{2} n (n - 1)

represents the total number of pairwise combinations of samples,

c

and

d

, respectively, represent the number of concordant and discordant pairs;

τ b

is used to handle tied ranks, where

τ_{x}

and

τ_{y}

represent the number of tied ranks in datasets

X

and

Y

, respectively. It is important to note that tied ranks occurring simultaneously in both

X

and

Y

are not counted in

τ_{x}

and

τ_{y}

.

ρ = 1 - \frac{6 \sum d_{i}^{2}}{n (n^{2} - 1)}

(15)

In the equation,

n

represents the number of samples, and

d_{i}

is the absolute difference between the ranks of the original observed data

x_{i}

and

y_{i}

.

The formula for calculating the mean of the Spearman coefficient (

ρ

) and Kendall rank coefficient (

τ

) (abbreviated as MSK) is as follows:

M S K = \frac{τ + ρ}{2}

(16)

The formula for calculating the squared Euclidean distance between the empirical joint probability distribution and the theoretical joint probability distribution is as follows:

S E D = \sum_{i = 1}^{n} {[E (i) - C (i)]}^{2}

(17)

The symbol

E (i)

represents the empirical joint probability distribution,

C (i)

represents the theoretical joint probability distribution fitted by different copula functions, and

n

is the sample size.

①: Comparative analysis of MSK

We use Matlab’s Copulastat function to calculate the

τ

and

ρ

of the bivariate copula under the given parameters, to measure the suitability of the input data with the copula function. Higher means of

τ

and

ρ

(MSK) indicate that the copula function is more capable of describing the dependence between input data, making it more suitable for simulating their joint probability distribution. Therefore, the copula function corresponding to the maximum MSK is chosen as the final connection function to compute the joint probability. The calculated results of MSK for the optimal bivariate copula, using gamma, normal, logistic, Pearson3, and KDE as marginal distribution functions, are shown in Figure 9. The mean and variance of MSK are further calculated and presented in Table 8, with the optimal values highlighted in bold. From Figure 9, it can be observed that the MSK of the bivariate copula with KDE as the marginal distribution function is higher than that of the other four methods. Particularly for some extreme supply and demand distribution shapes, KDE demonstrates significant superiority and robustness. From the statistical results in Table 8, it can be observed that the bivariate copula with KDE as the marginal distribution exhibits the optimal ability to fit joint probability distributions, demonstrating strong stability and robustness, particularly for extreme bivariate distribution shapes.

②: Comparative analysis of SED

First, the empirical joint probability distribution values of the supply and demand marginal distributions were calculated, and then, the optimal bivariate copula joint probability distribution values were calculated using five methods, gamma, normal, logistic, Pearson3, and KDE, as the marginal distribution functions. Subsequently, these values were input into Equation (17) to compute the squared Euclidean distance (SED) between the empirical joint probability (ECDF) and theoretical joint probability (CDF). The calculation results are shown in Figure 10, while the mean and variance statistics of SED are presented in Table 9, with the optimal values highlighted in bold. From Figure 10 and Table 9, it can be observed that the optimal bivariate copula model based on KDE exhibits minimal fitting bias and low fluctuation for all supply and demand sequences. The SED’s mean and variance are both minimal, indicating high simulation accuracy, robustness, and stability of the optimal bivariate copula model based on KDE.

3.5. Water Shortage Risk Assessment and Results Analysis

The superiority of the water shortage risk assessment model based on KDE and the optimal copula function has been validated. By incorporating monthly and yearly water supply and demand data, as well as potential losses of water users, into the model, and combining risk classification levels and threshold map** standards (Table 4), the monthly and yearly water shortage risks for all water receiving sub-regions in the study area were calculated. The findings of this study are consistent with those of **, et al. [75]: the frequency, cumulative intensity, and cumulative impact station of regional droughts in Yunnan all show an increasing trend; droughts in Yunnan occur most frequently in December, January, and March, and least frequently in July and August.

①: Seasonal Variation Characteristics Analysis of Water Shortage Risk

The average water shortage risk was calculated for the four seasons (spring, summer, autumn, and winter) and the seasonal variation characteristic curve of the water shortage risk is plotted in Figure 11. From Figure 11, it can be observed that the water shortage risk is generally higher in spring and winter compared to summer and autumn, with the highest overall water shortage risk occurring in spring and the lowest in autumn. In wet years, the water shortage risk is low throughout the spring, summer, autumn, and winter seasons, while in dry years, the water shortage risk in spring is significantly higher than in other seasons. Overall, there is a trend of increasing water shortage risk.

②: Analysis of water shortage risk in typical wet, normal, and dry years

Based on the precipitation observation sequences from five precipitation observation stations in the water receiving area over 51 years (1961–2011), the cumulative frequency of the sorted average precipitation was fitted using the Pearson3 curve; five typical years were selected for scenario analysis of monthly water shortage risk, corresponding to precipitation frequencies of 5% (extremely abundant year: 1971), 25% (abundant year: 1981), 50% (normal year: 1996), 75% (drought year: 1982), and 95% (extreme drought year: 2011). The water shortage risk was categorized into high risk, medium risk, and low risk based on the 6 to 12 level of risk (refer to Table 4), as shown in Figure 12. From Figure 12, it can be observed that the water shortage risks in extreme drought years and drought years are significantly higher than in other years, with the first half of the year exhibiting higher risk than the second half. Water shortage risk in abundant years and extremely abundant years is relatively low and remains consistent throughout the year, which is related to the typical monsoon climate in the study area. In extreme drought years, except for the rainy season in August, the region experiences high risk almost throughout the year. The first half of drought years is almost consistentlyat a high-risk level. Normal years and abundant years are mostly in a medium-risk state, with only extremely abundant years approaching a low-risk level.

③: Spatial Distribution Characteristics of Water Shortage Risk

The mean and variance of water shortage risks in each water-receiving sub-region from 1961 to 2011 were calculated. The statistical values were then utilized to create thematic maps of the spatial distribution of the mean and variance of water shortage risks using the ArcGIS platform, based on the inverse-distance-weighted interpolation method (Figure 13). This was performed to evaluate the long-term water shortage risk status and the variation in the water-receiving sub-regions. Combining Figure 13a,b, it can be observed that Shifu, Ynahe, Dajie, and Ningzhou have been in a severe water shortage state for a long time, indicating that they are high-risk areas; this is related to the population concentration and rapid economic development in these areas. Dajie and Ningzhou exhibit significant fluctuations in water shortage risk, indicating a stronger dependence on precipitation; Qianwei, Jiangcheng, **ushan, and Lishan have been in a moderate water shortage state for an extended period; the water resources in Luohe, Jiuxi, **ongguan, Panxi, and Gaoda are abundant, and the water shortage risk has remained relatively low over the long term; the variance in water shortage risk in Luohe, Jiuxi, and **ongguan is significant, indicating weak resilience in these areas, which are susceptible to the impacts of climate change and insufficient water infrastructure; only Gaoda exhibits strong resilience to water shortage risk, ensuring long-term water supply security without the need for external water supplementation.

4. Discussion

The accuracy of water scarcity risk assessment hinges on the time scale, spatial scale, water supply and demand entities, and assessment methods. To enhance overall quantification accuracy, Mekonnen and Hoekstra [76] utilized a higher spatial resolution (30 × 30 arc minutes), evaluated on a monthly time scale, and incorporated environmental flow requirements, yielding a more precise depiction of water scarcity scenarios. Salmivaara, et al. [77], addressing the modifiable areal unit problem (MAUP) in spatial water resource assessments, which arises from significant differences in results due to variations in analysis unit selection, proposed a multi-zone, multi-scale approach to enhance evaluation robustness. Similarly, Veldkamp, et al. [78], recognizing the inadequacy of a single scale for comprehensively revealing water scarcity issue diversity and complexity, employed probability simulation and multi-scale analysis methods to capture variations and extremes across scales, considering climate change and population growth contributions to water scarcity risks across regions. Some scholars have also made advancements in remote sensing, GIS technology, and the integration of multi-source data [9,79].

To enhance assessment accuracy, our study simultaneously considers temporal scales, spatial scales, and water users. Temporally, we examine the seasonal characteristics and historical trends of water scarcity risk on monthly and annual scales. Spatially, we use the minimum water-receiving area as the assessment unit to study differences in water scarcity risk and spatial distribution patterns under varying water endowments and socioeconomic structures across regions. Accounting for variations in domestic, industrial, and agricultural water demand and potential economic losses, the evaluation results better reflect reality. Finally, in terms of probability distribution simulation methods, our study pioneers the use of kernel density estimation to model the marginal probability distributions of supply and demand variables. The advantage of KDE lies in it not requiring a pre-assumption that the data follows a specific distributional form, nor does it need any prior knowledge. It can fit the probability distribution of discrete data based on the characteristics and properties of the data themselves. In particular, when the data distribution is complex and does not closely resemble any known distribution, KDE can capture the local features and details of the data, thereby providing more accurate and robust probability density estimates. However, a drawback of KDE is that it may not perform as well as methods specifically designed for discrete data when dealing with highly discrete data.

5. Conclusions

There are numerous studies on water shortage risk assessment models, but few studies exploring the robustness and evaluation accuracy of these models. This study leverages the non-parametric properties of KDE and the robustness in probability simulation to construct a new robust model for water shortage risk assessment based on copula functions. The model quantifies water shortage risk from the perspectives of both water shortage probability and loss rate. Through application in the Yuxi receiving area of the Central Yunnan Water Diversion Project, the robustness and accuracy of the model under various scenarios and data shapes are verified and analyzed. Additionally, seasonal characteristic analysis of historical water shortage risk, precipitation variability analysis under different scenarios, and assessment of the spatial distribution status and resilience of water receiving sub-regions are conducted. The conclusions drawn are consistent with the planning and design of the phase II supporting project of the Central Yunnan Water Diversion Project. The results of this research can provide data support and evaluation tools for the rational allocation of externally transferred water resources to Yuxi secondary receiving areas after the completion of the Central Yunnan Water Diversion Project. The constructed water shortage risk assessment model can be widely applied to the evaluation of water resource scarcity and rational water allocation in the entire receiving area of the Central Yunnan Water Diversion Project. It can also be applied to the quantitative research of other multivariate stochastic coupling risks, providing reference and guidance for other related research and applications.

The limitation of this study lies in its sole focus on enhancing the stability and accuracy of water scarcity risk assessment methods, without addressing the issue of water scarcity risk assessment in cross-basin water diversion projects under various coupled scenarios in the context of future climate change [80,81,82]. For instance, it did not explore the water scarcity risks in receiving areas under different water availability scenarios between the source and receiving areas, as well as the water scarcity risks under different proportions of external water diversion and local water sources. Furthermore, how to achieve optimal allocation of water resources under the premise of risk minimization based on water scarcity risk assessment results [57] is also an interesting topic worthy of further investigation. These will be key directions of future research.

Author Contributions

Conceptualization, T.Q. and Z.S.; methodology, T.Q.; software, T.Q.; validation, T.Q., J.C. (**g Chen) and J.C. (**ming Chen); formal analysis, T.Q. and W.X.; resources, S.G., L.W. and S.B.; data curation, S.B.; writing—original draft preparation, T.Q. and S.G.; project administration, Z.S.; funding acquisition, S.G.; writing—review and editing, T.Q., Z.S., S.G., W.X., J.C. (**g Chen), J.C. (**ming Chen), L.W. and S.B. All authors have read and agreed to the published version of the manuscript.

Funding

The authors gratefully acknowledge the financial support from the Demonstration project of comprehensive government management and large-scale industrial application of the major special project of CHEOS (No. 89-Y50G31-9001-22/23-05).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors would like to thank the National Meteorological Science Data Center for providing the precipitation data free of charge.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gudmundsson, L.; Boulange, J.; Do, H.X.; Gosling, S.N. Globally observed trends in mean and extreme river flow attributed to climate change. Science 2021, 371, 1159–1162. [Google Scholar] [CrossRef] [PubMed]
Min, S.-K.; Zhang, X.; Zwiers, F.W.; Hegerl, G.C. Human contribution to more-intense precipitation extremes. Nature 2011, 470, 378–381. [Google Scholar] [CrossRef]
Sara, S.A.; Abel, S.; Jaime, M.; Joaquín, A.; Javier, P.A. Risk assessment in water resources planning under climate change at the Júcar River basin. Hydrol. Earth Syst. Sci. 2020, 24, 5297–5315. [Google Scholar]
UNCCD. Global Drought Snapshot 2023: The Need for Proactive Action; United Nations: New York, NY, USA, 2023. [Google Scholar]
Maryam, S. Global water shortage and potable water safety; Today’s concern and tomorrow’s crisis. Environ. Int. 2021, 158, 106936. [Google Scholar]
UNESCO. The United Nations World Water Development Report 2021: Valuing Water; United Nations: New York, NY, USA, 2021. [Google Scholar]
Josefine, L.S.; Per, B. Differentiated vulnerabilities and capacities for adaptation to water shortage in Gaborone, Botswana. Int. J. Water Resour. Dev. 2021, 37, 278–299. [Google Scholar]
Wang, H.; Qian, L.; Zhao, Z.; Wang, Y. Theory and assessment method of water resources risk. J. Hydraul. Eng. 2019, 50, 980–989. [Google Scholar] [CrossRef]
Yang, P.; Zhang, S.; **: A GIS-MCDA approach for a medium-sized city in the Brazilian semi-arid region. Urban Water J. 2020, 17, 642–655. [Google Scholar] [CrossRef]
Janssen, J.; Radić, V.; Ameli, A. Assessment of Future Risks of Seasonal Municipal Water Shortages Across North America. Front. Earth Sci. 2021, 9, 730631. [Google Scholar] [CrossRef]
Zha, X.; Sun, H.; Jiang, H.; Cao, L.; Xue, J.; Gui, D.; Yan, D.; Tuo, Y. Coupling Bayesian Network and copula theory for water shortage assessment: A case study in source area of the South-to-North Water Division Project (SNWDP). J. Hydrol. 2023, 620, 129434. [Google Scholar] [CrossRef]
Dehghani, S.; Bavani, A.M.; Roozbahani, A.; Sahin, O. Assessment of Climate Change-Induced Water Scarcity Risk by Using a Coupled System Dynamics and Bayesian Network Modeling Approaches. Water Resour. Manag. 2024. [Google Scholar] [CrossRef]

Figure 1. The algorithm flowchart of water shortage risk assessment model in this study.

Figure 2. Overview of the study area: (a) Location of study area and receiving water sub-regions; (b) river–lake hydrological system and topography.

Figure 3. Annual supply and demand series in the water-receiving area (unit: 10⁴×m³).

Figure 4. Kernel density estimation with different bandwidths.

Figure 5. Bivariate copula Density function and distribution function: (a) Gaussian copula; (b) t copula; (c) Gumbel copula; (d) Clayton copula; (e) Frank copula.

Figure 6. KS p-values of water demand and supply sequences.

Figure 7. RMSE of water demand and supply sequences.

Figure 8. The comparative graph of probability distributions fitted by different methods.

Figure 9. MSK of the bivariate copula with different method as the marginal distribution function.

Figure 10. SED between the empirical joint probability and the theoretical joint probability.

Figure 11. Seasonal characteristics of water shortage risk.

Figure 12. Variation characteristics of water shortage risk for five typical years.

Figure 13. Spatial distribution characteristics of water shortage risk: the mean (a) and variance (b) of water shortage risks.

Table 1. Binary copula connecting functions and parameter properties [52,69].

Name of Copula	$C (u, v)$	Parameters
Gaussian Copula	$C (u, v; ρ) = Φ_{2} (Φ^{- 1} (u), Φ^{- 1} (v); ρ)$	$Φ$ and $Φ^{- 1}$ represent the CDF and the inverse CDF of the standard normal distribution, respectively $Φ_{2}$ denotes the CDF of the bivariate standard normal distribution $ρ \in [- 1, 1]$ is the correlation coefficient between variables $x$ and $y$ , reflecting the degree of linear association between $x$ and $y$ . $ρ = 1$ indicates perfect positive correlation, $ρ = - 1$ indicates perfect negative correlation, and $ρ = 0$ indicates no correlation.
t Copula	$C (u, v; ρ, τ) = t_{τ} (\sqrt{τ} \cdot Φ^{- 1} (u), \sqrt{τ} \cdot Φ^{- 1} (v); ρ)$	$Φ$ and $Φ^{- 1}$ represent the CDF and the inverse CDF of the standard normal distribution, respectively. $t_{τ}$ denotes the CDF of the bivariate t-distribution with degrees of freedom $τ$ , where $τ$ typically takes the minimum degrees of freedom parameter that satisfies the dependence structure between the two variables.
Gumbel Copula	$C (u, v; θ) = e x p (- {[{(- l n u)}^{θ} + {(- l u v)}^{θ}]}^{\frac{1}{θ}})$	$θ > - 1$ determines the degree of dependence between two variables. As $θ$ increases, the positive correlation between variables strengthens. Conversely, negative values of $θ$ imply the presence of some form of negative correlation.
Clayton Copula	$C (u, v; θ) = {(m a x \{u^{- θ} + v^{- θ} - 1, 0\})}^{- \frac{1}{θ}}$	$θ > 1$ determines the degree of dependence between two variables. As $θ \to \infty$ , the variables become nearly independent. As $θ$ decreases, the positive correlation between variables strengthens, especially in the tail regions.
Frank Copula	$C (u, v; θ) = - \frac{1}{θ} \ln [1 + \frac{(e^{- θ u} - 1) (e^{- θ v} - 1)}{e^{- θ} - 1}]$	$θ \in (- \infty + \infty)$ determines the degree of dependence between two variables. As $θ \to 0$ , the variables become nearly independent. As $\| θ \|$ increases, the strength of positive or negative correlation between variables strengthens.

Table 2. Classification of water shortage probability levels and map** standards for threshold intervals.

Probabilistic Qualitative Description	Probability Range (%)	Level Map**	Warning Signal
Extremely low, almost impossible to occur, no warning	[0.001, 0.1)	1	Green
Low, unlikely to occur, no warning	[0.1, 0.2)	2	Blue
Medium, occasional occurrence, advisory	[0.2, 0.3)	3	Yellow
High, likely to occur, warning	[0.3, 0.5)	4	Orange
Extremely high, frequent occurrence, emergency	[0.5, 1.0]	5	Red

Table 3. Classification of potential loss rate levels and map** standards for threshold intervals.

Severity Description of Water Shortage	Potential Loss Rate Range (%)	Level Map**	Warning Signal
Essentially No Scarcity: Indicates abundant water resources with no apparent supply issues.	<5	1	Green
Mild Scarcity: Implies slight water shortages, but not enough to significantly impact normal life and production.	5–10	2	Blue
Moderate Scarcity: Suggests water supply tension, potentially affecting certain areas or industries’ livelihoods and production.	10–20	3	Yellow
Severe Scarcity: Indicates severe water shortages that could impact normal life and production across large areas or multiple industries.	20–40	4	Orange
Critical Scarcity: Represents extremely scarce water resources, possibly resulting in interruptions, severe losses, or even threats to safety and life, production, and ecology.	>40	5	Red

Table 4. Map** matrix of water shortage risk levels and criteria for warning level classification.

Likelihood (Probability Level)	Potential Loss Rate Level					Warning Signal		Qualitative Description of Safety Level
Likelihood (Probability Level)	1	2	3	4	5	Color ¹	Warning Level	Qualitative Description of Safety Level
1	1	2	3	4	5	Green	No warning	Low risk (R ≤ 6)
2	2	4	6	8	10	Blue	Blue warning	Low risk (R ≤ 6)
3	3	6	9	12	15	Yellow	Yellow warning	Medium risk (6 < R ≤ 12)
4	4	8	12	16	20	Orange	Orange warning	Medium risk (6 < R ≤ 12)
5	5	10	15	20	25	Red	Red warning	High risk (R > 12)

Note: ¹ Different colors represent distinct warning levels, while the same color signifies an equivalent warning level.

Table 5. Copula parameter estimation results.

Copula	Parameter Name	Parameter Values
Gaussian Copula	$ρ$	1	0.9859
Gaussian Copula	$ρ$	0.9859	1
t Copula	$ρ$	1	0.9904
	$ρ$	0.9904	1
	$τ$	2.65
Gumbel Copula	$θ$	12.8667
Clayton Copula	$θ$	10.0259
Frank Copula	$θ$	43.6217

Table 6. Statistical measures of KS p-values.

Sequence	Statistical Measure	Gamma	Normal	Logistic	Pearson3	KDE
Water Demand	Mean	0.476	0.437	0.518	0.56	0.57
	Variance	0.128	0.108	0.092	0.124	0.075
	Rejection Rate of the Null Hypothesis (%)	0	0.59	0	0	0
Water Supply	Mean	0.331	0.336	0.523	0.46	0.666
	Variance	0.12	0.103	0.091	0.119	0.057
	Rejection Rate of the Null Hypothesis (%)	32.54	21.30	3.55	15.97	0

Table 7. Statistical measures of RMSE.

Sequence	Statistical Measure	Gamma	Normal	Logistic	Pearson3	KDE
Water Demand	Mean	0.052	0.056	0.049	0.046	0.046
	Variance	0.00037	0.00037	0.00025	0.00026	0.00017
	Range	0.055	0.056	0.046	0.045	0.038
Water Supply	Mean	0.077	0.065	0.048	0.058	0.039
	Variance	0.00240	0.00094	0.00034	0.00097	0.00011
	Range	0.189	0.134	0.117	0.139	0.037

Table 8. Statistical measures of MSK.

Statistical Measure	Gamma	Normal	Logistic	Pearson3	KDE
Mean	0.69	0.73	0.77	0.78	0.8
Variance	0.063	0.044	0.033	0.026	0.022

Table 9. Statistical measures of SED.

Statistical Measure	Gamma	Normal	Logistic	Pearson3	KDE
Mean	0.098	0.065	0.043	0.040	0.030
Variance	0.0162	0.0060	0.0028	0.0019	0.0009

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qian, T.; Shi, Z.; Gu, S.; **, W.; Chen, J.; Chen, J.; Bai, S.; Wu, L. A Water Shortage Risk Assessment Model Based on Kernel Density Estimation and Copulas. Water 2024, 16, 1465. https://doi.org/10.3390/w16111465

AMA Style

Qian T, Shi Z, Gu S, ** W, Chen J, Chen J, Bai S, Wu L. A Water Shortage Risk Assessment Model Based on Kernel Density Estimation and Copulas. Water. 2024; 16(11):1465. https://doi.org/10.3390/w16111465

Chicago/Turabian Style

Qian, Tanghui, Zhengtao Shi, Shixiang Gu, Wenfei **, **g Chen, **ming Chen, Shihan Bai, and Lei Wu. 2024. "A Water Shortage Risk Assessment Model Based on Kernel Density Estimation and Copulas" Water 16, no. 11: 1465. https://doi.org/10.3390/w16111465

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Water Shortage Risk Assessment Model Based on Kernel Density Estimation and Copulas

Abstract

1. Introduction

2. Principles and Methods

2.1. Simulating the Marginal Probability

2.2. Simulating the Joint Probability

2.3. Estimation of Potential Loss Rate

2.4. Estimation of Water Shortage Risk

3. Instance Application and Comparative Experiment

3.1. Overview of the Study Area

3.2. Data Sources

3.3. Building the Water Shortage Risk Assessment Model

3.4. Model Performance Comparative Experiment

3.4.1. Comparison Experiments on the Accuracy and Robustness of Marginal Distribution Simulations

3.4.2. Joint Probability Simulation Accuracy and Robustness Comparative Experiment

3.5. Water Shortage Risk Assessment and Results Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI