Seismic Response Prediction of Porcelain Transformer Bushing Using Hybrid Metaheuristic and Machine Learning Techniques: A Comparative Study

Zhou, Quan; Mao, Yongheng; Guo, Fengqi; Liu, Yuxuan

doi:10.3390/math12132084

Open AccessArticle

Seismic Response Prediction of Porcelain Transformer Bushing Using Hybrid Metaheuristic and Machine Learning Techniques: A Comparative Study

¹

School of Civil Engineering, Central South University, Changsha 410075, China

²

China Construction Fifth Engineering Bureau Co., Ltd., Changsha 410004, China

³

State Grid Hunan Electric Power Co., Ltd., Changsha 410004, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(13), 2084; https://doi.org/10.3390/math12132084

Submission received: 22 May 2024 / Revised: 21 June 2024 / Accepted: 26 June 2024 / Published: 3 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

Although seismic response predictions are widely used for engineering structures, their applications in electrical equipment are rare. Overstressing at the bottom of the porcelain insulators during seismic events has made power transformer bushings in substations prone to failure. Thus, this paper proposed and compared six integrated machine learning (ML) models for seismic stress response predictions for porcelain transformer bushings using easily monitored acceleration responses. Metaheuristic algorithms such as particle swarm optimization were employed for architecture tuning. Prediction accuracies for stress response values and classifications were evaluated. Finally, shaking table tests and simulation analyses for a 1100 kV bushing were implemented to validate the accuracy of the six ML models. The results indicated that the proposed ML models can quickly forecast the maximum stress experienced by a porcelain bushing during earthquakes. Swarm intelligence evolutionary technologies could quickly and automatically aid in the retrofitting of architecture for the ML models. The K-nearest neighbor regression model had the best level of prediction accuracy among the six selected ML models for experimental and simulation validations. ML prediction models have clear benefits over frequently used seismic analytical techniques in terms of speed and accuracy for post-earthquake emergency relief in substations.

Keywords:

porcelain transformer bushing; seismic response prediction; metaheuristic optimization; machine learning; simulation analysis; shaking table tests

MSC:

68W99

1. Introduction

Substations play a crucial role in power transmission since they are responsible for converting and regulating electricity [1]. Nevertheless, during past seismic events [2,3,4], including the 2022 Luding earthquake in China [5], electrical equipment in substations endured significant harm and displayed clear susceptibility. Damage to substation equipment can interrupt normal power delivery and result in high financial costs, further hindering post-earthquake relief work.

Electrical equipment used in substations typically includes a vertical or inclined cantilever insulator equipment body together with a supporting structure such as a steel frame or turret, as shown in Figure 1. Substations consist of a variety of equipment, such as post insulators, transformer bushings, disconnect switches, circuit breakers, and surge arresters. The equipment damaged in the 2008 Wenchuan earthquake exhibited numerous common failure mechanisms, including failure of the porcelain insulators, the cracking of connection flanges [5], and oil leakage from insulators [6]. Damage to equipment often occurs due to high stress in the bottom positions, surpassing the materials’ maximum strength. Thus, determining the peak stresses of the porcelain insulators during earthquakes is a crucial factor for maintaining structural integrity.

In recent years, studies with a particular emphasis on substation equipment, including power transformers [7,8,9,10,11] and other equipment [12,13,14], have been conducted. These studies have involved seismic response analysis [15,16], seismic mitigation [17,18,19], and vulnerability and risk assessments [20,21,22]. Recently, research has been conducted on the evaluation of substation equipment after earthquakes [23,24]. Among them, power transformers are of high concern since they are the core equipment in substations. In 1998, Bellorini et al. conducted static calculations and vibrational experiments on high-voltage bushings to assess the stipulated amplification factor between the ground and the transformer bushing flange [25]. In 2019, He et al. conducted shaking table experiments to assess the seismic performance of five UHV transformer bushings mounted on support frames [26,27]. Their study indicated that the metal flange may be susceptible to collapsing during earthquakes. Based on this analysis, a revised flange design was suggested to enhance its ability to withstand seismic activity [28]. Additional research on the relationship between flanges and their stiffness has also been conducted [29]. ** between various physical variables. This research aims to estimate the peak stress at the bottom of the transformer bushing. Unlike a classified index, this quantity is a continuous value; therefore, regression techniques are used instead of classification algorithms. In the context of a certain prediction model, the optimum hyperparameter combination is considered as the solution that leads to the best performance. This implies that it maximizes the performance function or minimizes the error function. While ML models can enhance their prediction performance by artificial manipulation of hyperparameters, the excessive time required for this procedure is often unsatisfactory. Fortunately, biological metaheuristic algorithms provide possible replacements for artificial methods, allowing for the automated discovery of the best combinations of hyperparameters [37]. This work used the particle swarm optimization (PSO) method as the search strategy for optimal hyperparameters. PSO is a stochastic search algorithm that mimics the cooperative foraging behavior of birds. It should be pointed out that other heuristic optimization algorithms can also be used. The main role is to find the optimal hyperparameters for ML models, hence any optimization algorithm can be accepted as long as it can help the ML model achieve the required predictive ability. In this paper, PSO is used as a case study to illustrate the overall process. Its basic principle is introduced in Section 2.2.

2.1.1. Multi-Layer Perceptron (MLP)

The multi-layer perceptron (MLP) is a common neural network structure. Artificial neural networks, convolutional neural networks, and recurrent neural networks are very effective at making accurate predictions. MLP, as a basic network, also has a robust capacity to determine both linear and non-linear relationships. The MLP model is capable of converting a map** from an m-dimensional input X^m = [x₁, x_2, …, x_m] to an n-dimensional output Yⁿ = [y₁, y₂, …, y_n]. Figure 2 illustrates the overall structure of an MLP with two hidden layers and a one-dimensional output. Every neuron in the hidden layer is responsible for converting the input from the preceding layer into the output of that neuron. It is achieved using a weighted linear summation and a non-linear activation function. Once the numbers of hidden layers and neurons in each layer are determined, the learned objectives are the weighted coefficients and bias values. Following this, the output can be provided by the MLP model if the m-dimension vector is given.

2.1.2. Support Vector Regression (SVR)

The goal of the support vector machine (SVM) technique is to identify a hyperplane in an n-dimensional space that can accurately classify input points. Support vector regression (SVR) employs the same underlying ideas as SVM, but instead of doing classification, SVR focuses on predicting numerical values. Although SVM is more well-known, SVR is known for its efficacy in estimating real-value functions. SVR is a supervised learning technique that builds models using symmetric loss functions that may punish both overestimation and underestimation. The SVR model, using the ordinary least squares approach, establishes a threshold ε around the regression line (or hyperplane) to exclude any data points inside ε from being penalized for mistakes. The range of deviation is referred to as the ε-pipeline. The corresponding mathematical issue involves minimizing the norm of the weight vector, as represented by Equation (1), while satisfying the constraint condition described in Equation (2).

\underset{w, b}{m i n} \frac{1}{2} {‖w‖}_{2}^{2}

(1)

|y_{i} - (w^{T} x_{i} + b)| \leq ε i = 1,2, . . ., N

(2)

where w represents the weight and b represents the bias value. (x_i, y_i) is the i-th sample.

2.1.3. Kernel Ridge Regression (KRR)

Ridge regression is a modified version of the least squares approach that imposes a penalty on the size of the coefficients. Kernel ridge regression (KRR) is a combination of ridge regression and the kernel trick, as described by Murphy [39]. The space resulting from combining the kernel and the data is determined, and the KRR algorithm then estimates a linear relationship. A non-linear relationship may be obtained from the original data using non-linear kernels. Unlike the least squares approach, KRR sacrifices its unbiasedness in order to achieve great numerical stability, resulting in improved computational accuracy. The KRR model can offer map** between the covariates x and the output variables y, both of which are continuous. The primary objective is to minimize the overall loss function, which can be expressed as:

{\sum_{i} (w^{T} ϕ (x_{i}) - y_{i})}^{2} + \frac{λ}{2} {‖w‖}^{2}

(3)

where w is the weight and λ is the ridge parameter. ϕ(·) is a feature map and (x_i, y_i) is the i-th sample. In Equation (3), the leftmost items are the cumulative error and regularization. The goal of KRR is to determine the estimation model f(x) = w^Tϕ(x). In KRR, the kernel k(x, y) = ϕ(x)^Tϕ(y) such that the feature map is not included in the estimation model f.

2.1.4. Stochastic Gradient Descent Regression (SGDR)

Gradient descent is a widely used approach for optimizing model parameters in machine learning architecture. The gradient descent approach may be used to iteratively get the minimal loss and related model hyperparameters while searching for the minimum value of the loss function. Machine learning encompasses two gradient descent algorithms that are derived from fundamental theory: stochastic gradient descent and batch gradient descent. The pseudocodes for stochastic gradient descent are shown in Algorithm 1. Only the former is examined in this research due to its efficiency and ease of implementation. Each training sample promptly changes settings to achieve a higher speed.

Algorithm 1 Stochastic Gradient Descent.

1: Require: Learning rate η_k

2: Require: Initial parameter θ
3: while Stop** criterion not met do
4: Sample a minibatch of m examples from the training set
5: {x¹,…,x^m}
6: Set g = 0.
7: for i = 1 to m do if m = 1 → single example SGD
8: Compute gradient estimate:

\overset{\land}{g} \leftarrow \overset{\land}{g} + \frac{1}{m} \nabla_{θ} L (f (x^{i}; θ), y^{i})

9: end for
10: Apply update:

θ \leftarrow θ - η_{k} \overset{\land}{g}

11: end while

The loss function quantifies the discrepancy between the estimated and real values in the linear estimation Equation (4), as shown in Equation (5).

f (x_{i}) = θ^{T} x_{i}

(4)

L (θ) = \frac{1}{2} (y_{i} - f (x_{i}))^{2}

(5)

The loss function is directly associated with the model parameter, where the model parameter is considered the independent variable and the loss is the dependent variable. In order to reduce the amount of loss, the model parameter θ is adjusted in the direction of the gradient (as shown in Equation (6)). Thus, the hyperparameter η determines the pace at which the best solution, namely the learning rate, is obtained. Both excessively big and tiny values are unsuitable.

θ : = θ - η * \nabla L = θ - η \times \frac{\partial}{\partial θ} L (θ)

(6)

where x_i represents an input vector, y_i represents an output corresponding to x_i, and L represents the loss function.

2.1.5. K-Nearest Neighbor Regression (KNR)

If the majority of the K-nearest samples in the feature space belong to a certain category, then the sample should be classified in that category. This technique, called the K-nearest neighbor algorithm, is used in the field of machine learning. K-nearest neighbor regression (KNR) is an extension that is used to address challenges associated with the prediction of continuous variables, as opposed to classification problems. KNR is a non-parametric learning approach, which implies that it does not need training a function as an estimating model. It only depends on the initial data. In Figure 3, the blue point is currently unknown and we anticipate acquiring its output. Given that K = 4, there are four data points that are in close proximity to this blue point. The proximity between any two points is contingent upon the definition of distance. Typically, Euclidean distance or Manhattan distance are used. After performing the distance computation, the outputs of these four data points are retrieved, and the average value is used as the estimated output of the blue point.

2.1.6. Decision Tree Regression (DTR)

A decision tree (DT) is a graphical representation in the form of a tree, as shown in Figure 4. A DT does not need to assume a relationship between input and output before training, unlike other models. This is because a DT relies on the original data structure. This approach is applicable for both linear and non-linear relationship estimation. Algorithm 2 shows the pseudocode for the decision tree algorithm. Figure 4 illustrates a straightforward example of DTR in a 2D feature space. The tree originates from the root node, which encompasses all the data points. The first optimum division (x₁, a) is determined by calculating the error, resulting in the creation of two data subsets. The same computations are performed for each subset in order to partition the spaces into further subspaces. Following two divisions, a total of four-leaf nodes are identified. The decision tree regression operates on the premise of averaging the outputs of all data points inside a subspace to get the estimated output. If a newly acquired data point is verified as belonging to this specific subspace, its output is determined solely based on the output of this subspace.

Algorithm 2 Pseudocode for the decision tree algorithm.

1: Input: Training dataset D;

2: Output: Regression tree f (x)
3: In the input space where the training dataset is located, recursively divide each
4: region into two subregions and determine the output values on each subregion and
5: construct a binary decision tree:
6: (1) Select the optimal segmentation variable and segmentation point s, and solve it
7:

\min_{j, s} [\min_{c_{1}} \sum_{x_{i} \in R_{1} (j, s)} {(y_{i} - c_{1})}^{2} + \min_{c_{2}} \sum_{x_{i} \in R_{2} (j, s)} {(y_{i} - c_{2})}^{2}]

8: Traverse variable j, scan the segmentation point s for a fixed segmentation variable
9: j, and select the pair (j, s) that minimizes equation
10: (2) Divide the region using the selected pairs (j, s) and determine the corresponding
11: output values:
12:

R_{1} (j, s) = \{x | x^{(j)} \leq s\}, R_{2} (j, s) = \{x | x^{(j)} \leq s\}

13:

\overset{\land}{c_{m}} = \frac{1}{N_{m}} \sum_{x_{i} \in R_{m} (j, s)} y_{i}, x \in R_{m}, m = 1, 2

14: (3) Continue to call the above two steps on the two subregions until the stop
15: condition 1:is met.
16: (4) Divide the input space into M regions R₁, R₂, …, R_M, and generate a decision tree:
17:

f (x) = \sum_{m = 1}^{M} \overset{\land}{c_{m}} I (x \in R_{m})

2.2. Particle Swarm Optimization Technology

Particle swarm optimization (PSO) is a swarm intelligence algorithm [33] that mimics the predatory behavior of bird groups in nature. The flock’s objective is to locate sustenance within a designated area. During the search, the birds communicate and exchange information, including their unique positions. In order to determine whether they have discovered the most effective answer via collaboration, the birds assess the situation and share their own ideal solution with the whole flock. Ultimately, the flock may converge on the food source, thus establishing the ideal solution.

Algorithm 3 displays the whole sequence of steps in the method. A group of particles with no mass is used to replicate the collective behavior of bird flocks, which symbolizes the random solutions. The function output represents the fitness value associated with a particle. Every particle has two distinct characteristics: velocity (V) and position (X). During each iteration, the particles undergo displacement from their present position X based on their velocity V and two extrema. There are two types of extrema: the individual extremum, known as pbest, and the global extremum, known as gbest. The term “gbest” refers to the highest fitness value among all fitness values associated with all particles, as shown in Equations (7) and (8).

V_{j} : = ω \times V_{j} + c_{1} \times r a n d () \times (p b e s t_{j} - X_{j}) + c_{2} * r a n d () \times (g b e s t_{j} - X_{j})

(7)

X_{j} : = X_{j} + V_{j}

(8)

where V_j and X_j represent the speed and location of particle j, respectively; ω represents the inertia weight; c₁ and c₂ represent the learning factors, generally c₁ = c₂ = 2. After enough iterations, the particles find the optimal solution for the best fitness value.

Algorithm 3 Basic algorithm flow for PSO.

1:(1) Determine the fitness function f according to the actual problem;

2: (2) Give the total number of particles and the maximum generation, and initialize
3: them randomly;

4: (3) Calculate global optimum gbest;

5: (4) Update the speed V and position X of all particles per Equations (7) and (8);

6: (5) Calculate the fitness value of each particle;

7: (6) Update local optimum gbest for each particle;

8: (7) Update global optimum gbest;

9: (8) Is the maximum generation met? Yes, end; No, repeat Steps 4~8.

10: Output the optimal fitness and solutions.

Characteristic	Magnitude	Epicenter Distance (km)	Fault Type Rrup (km)	PGA (g)
Range	4.37~7.36	1.06~247.04	3.21~222.41	0.005~0.84
Mean value	5.91	42.74	38.09	0.128
Standard deviation	0.67	42.26	38.58	0.124

Actual Stress	Predicted Stress		Accuracy = (TP + TN)/(TP + TN + FP + FN)
Actual Stress	<CSR	≥CSR	Accuracy = (TP + TN)/(TP + TN + FP + FN)
<CSR	TP	FN	TPR = TP/(TP + FN)	FNR = FN/(TP + FN)
≥CSR	FP	TN	FPR = FP/(FP + TN)	TNR = TN/(FP + TN)

Ml Models	Hyperparameters	Search Range	Optimal Results
MLP	Number of neurons in the 1st layer	1~20, integer	19
MLP	Number of neurons in the 2nd layer	1~20, integer	13
SVR	Regularization parameter	0.01~10	4.676
SVR	Kernel	‘linear’, ‘poly’, ‘rbf’	‘linear’
KRR	Regularization strength	0.01~10	0.01
KRR	Kernel	‘linear’, ‘polynomial’, ‘rbf’	‘linear’
KNR	Number of neighbors	1~50, integer	2
KNR	Leaf size	1~100, integer	40
SGDR	Regularization term (RT)	0.1~10	0.1103
	The maximum iteration number (MI)	100~10,000, integer	6139
	Stop** criteria (SC)	10⁻⁵~10⁻²	0.00952
DTR	The max depth of the tree (MD)	1~20, integer	11
	The minimum samples required to split (MSS)	2~10, integer	7
	The minimum samples required for a leaf (MSL)	1~10, integer	2

Generation	20	60	100	120	160	180	220	240	280	300
MLP	0.081	0.078	0.078	0.078	0.078	0.078	0.078	0.078	0.078	0.078
SVR	0.077	0.077	0.077	0.077	0.077	0.077	0.077	0.077	0.077	0.077
KRR	0.079	0.079	0.079	0.079	0.079	0.079	0.079	0.079	0.079	0.079
KNR	0.147	0.116	0.116	0.112	0.112	0.112	0.112	0.112	0.112	0.112
SGDR	0.192	0.164	0.109	0.105	0.101	0.11	0.103	0.095	0.095	0.095
DTR	0.123	0.12	0.119	0.12	0.119	0.12	0.12	0.119	0.119	0.119

Indicator	MLP	SVR	KRR	KNR	SGDR	DTR
MAPE	0.0737	0.0771	0.0753	0.1009	0.1146	0.1355
R	0.994	0.994	0.994	0.988	0.993	0.983
MSE	0.6307	0.6340	0.6059	1.2468	2.3710	1.6747
MAE	0.5666	0.5667	0.5593	0.7630	1.2657	0.9227

Type	Actual Stress	Predicted Stress		Confusion Matrix		Accuracy
Type	Actual Stress	<25 MPa	≥25 MPa	Confusion Matrix		Accuracy
MLP	<25 MPa	65	0	TPR = 100%	FNR = 0	98.6%
MLP	≥25 MPa	1	4	FPR = 20%	TNR = 80%	98.6%
SVR	<25 MPa	65	0	TPR = 100%	FNR = 0	98.6%
SVR	≥25 MPa	1	4	FPR = 20%	TNR = 80%	98.6%
KRR	<25 MPa	65	0	TPR = 100%	FNR = 0	100%
KRR	≥25 MPa	0	5	FPR = 0	TNR = 100%	100%
KNR	<25 MPa	65	0	TPR = 100%	FNR = 0	100%
KNR	≥25 MPa	0	5	FPR = 0	TNR = 100%	100%
SGDR	<25 MPa	65	0	TPR = 100%	FNR = 0	98.6%
SGDR	≥25 MPa	1	4	FPR = 20%	TNR = 80%	98.6%
DTR	<25 MPa	65	0	TPR = 100%	FNR = 0	98.6%
DTR	≥25 MPa	1	4	FPR = 20%	TNR = 80%	98.6%

Test	Test Scenario
Test	Earthquake Motion	Target PGA/g
TS 1	White noise	0.07
TS 2	Artificial ground motion	0.15
TS 3	White noise	0.07

Article Menu

Seismic Response Prediction of Porcelain Transformer Bushing Using Hybrid Metaheuristic and Machine Learning Techniques: A Comparative Study

Abstract

1. Introduction

2.1.1. Multi-Layer Perceptron (MLP)

2.1.2. Support Vector Regression (SVR)

2.1.3. Kernel Ridge Regression (KRR)

2.1.4. Stochastic Gradient Descent Regression (SGDR)

2.1.5. K-Nearest Neighbor Regression (KNR)

2.1.6. Decision Tree Regression (DTR)

2.2. Particle Swarm Optimization Technology

3. Structure of the 1100 kV Transformer Bushing

4. Framework and Methodology

4.1. Application Framework

4.2. Data and Sample Collection

4.3. Evaluation Indicators

4.4. Hyperparameter Tuning

5. Results and Discussion

5.1. Data Collection Results

5.2. Hyperparameter Tuning Results

5.3. Regression Prediction Error Analysis

5.4. Prediction Performance Evaluation for Response Classifications

6. Experimental and Simulation Validation

6.1. Experimental Validation

6.2. Simulation Validation

6.3. Overall Comparison

7. Conclusions and Future Work

7.1. Conclusions

7.2. Limitations and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI