Fast Decision Algorithm of CU Size for HEVC Intra-Prediction Based on a Kernel Fuzzy SVM Classifier

He, Shuqian; Deng, Zhengjie; Shi, Chun

doi:10.3390/electronics11172791

Open AccessArticle

Fast Decision Algorithm of CU Size for HEVC Intra-Prediction Based on a Kernel Fuzzy SVM Classifier

by

Shuqian He

¹,

Zhengjie Deng

^1,* and

Chun Shi

²

¹

School of Information Science and Technology, Hainan Normal University, Haikou 571158, China

²

School of Electronic and Information, Guangdong Polytechnic Normal University, Guangzhou 510640, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(17), 2791; https://doi.org/10.3390/electronics11172791

Submission received: 29 July 2022 / Revised: 29 August 2022 / Accepted: 1 September 2022 / Published: 5 September 2022

(This article belongs to the Special Issue Video Coding, Processing, and Delivery for Future Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

High Efficiency Video Coding (HEVC) achieves a significant improvement in compression efficiency at the cost of extremely high computational complexity. Therefore, large-scale and wide deployment applications, especially mobile real-time video applications under low-latency and power-constrained conditions, are more challenging. In order to solve the above problems, a fast decision method for intra-coding unit size based on a new fuzzy support vector machine classifier is proposed in this paper. The relationship between the depth levels of coding units is accurately expressed by defining the cost evaluation criteria of texture and non-texture rate-distortion cost. The fuzzy support vector machine is improved by using the information entropy measure to solve the negative impact of data noise and the outliers problem. The proposed method includes three stages: the optimal coded depth level “0” early decision, coding unit depth early skip, and optimal coding unit early terminate. In order to further improve the rate-distortion complexity optimization performance, more feature vectors are introduced, including features such as space complexity, the relationship between coding unit depths, and rate-distortion cost. The experimental results showed that, compared with the HEVC reference test model HM16.5, the proposed algorithm can reduce the encoding time of various test video sequences by more than 53.24% on average, while the Bjontegaard Delta Bit Rate (BDBR) only increases by 0.82%. In addition, the proposed algorithm is better than the existing algorithms in terms of comprehensively reducing the computational complexity and maintaining the rate-distortion performance.

Keywords:

HEVC; intra-coding; CU size decision; fuzzy SVM; rate distortion

1. Introduction

Recently, with the development of technology, in order to meet the compression requirements of high-definition and ultra-high-definition video, research on how to further improve the coding efficiency of next-generation video codecs has been carried out on a large scale. ISO/IEC (MPEG) and ITU-T (VCEG) have formed the Joint Collaborative Team on Video Coding (JCT-VC) to develop a next generation video coding standard, which is called High Efficiency Video Coding (HEVC) [1,2,3]. In recent years, with the promotion of 5G networks and the popularization of mobile smart terminal devices, video data services in video applications such as online education, video conferencing, online games, and video social interaction have exploded. Especially with the emergence of high-definition and ultra-high-definition video services, end users are increasingly demanding high frame rates, a high resolution, and a high dynamic range for real-time video interaction, which brings new challenges to the implementation of video coding. The latest report from Sandvin, an Internet smart solution provider, shows that in 2019, video traffic accounted for more than 60% of the global Internet downlink traffic [1]. The widely used H.264/AVC video coding standard can no longer meet the current real-time video coding needs. Compared with H.264/AVC, the video coding standard (H.265/HEVC) keeps the visual quality unchanged and the transmission bit rate is reduced by 50%, but the computational complexity is high. To improve the coding rate-distortion performance, the HEVC computational complexity evaluation is increased by nearly three times [4].

Accordingly, many coding efficiency improvement techniques, such as coding tree unit (CTU) coding (from 64 × 64 down to 8 × 8 Luma samples), prediction unit (PU) coding, transform unit (TU) tree coding, uniform intra-prediction and so on, are available in the current HEVC. More details can refer to [1,2,4]. Each picture frame in HEVC is partitioned into many square-shaped CTUs. A CTU represents the most basic processing unit, and each CTU can be recursively split into a smaller CU based on a quadtree structure. The CU sizes range from 8 × 8 up luma samples to 64 × 64. The coding depth order for CUs is referred to as the z-scan and is illustrated for an example in Figure 1.

Due to the extended prediction and transform block size with a flexible coding structure, HEVC can meet the coding requirements of general video sequences and can effectively code video sequences from low to high picture resolutions. Because up to 35 prediction modes are used in HEVC (e.g., mode 0 is used for PLANAR, mode 1 is used for DC and other angle modes) while only 9 modes are used in H.264/AVC, HEVC uses a more complex structure and has more accurate prediction accuracy, and its intra-frame prediction is significantly improved compared to H.264/AVC [3]. The significant increased number of angular intra-prediction modes in HEVC leads to a much higher compression efficiency. However, it also brings a tremendous computational burden to the mode decision process when the expensive rate-distortion optimization (RDO) process is used to search for the minimum rate distortion (RD) cost mode. Therefore, in the real implementation of HEVC codec, the main objective of the fast mode decision method is to reduce the computational complexity while maintaining high video coding RD performance.

Recently, many researchers devoted their efforts to the reduction in the complexity of HEVC for prediction coding. The main idea is to skip or terminate the non-optimal rate-distortion optimization mode selection processing in advance through prediction, avoiding the exhaustive search of RDO for coding unit depth size decision, prediction mode selection, and transform coding unit selection. The choice of decision-making strategies and thresholds is mainly determined by the temporal and spatial statistical characteristics of the video content and machine learning methods. The following two categories are introduced, respectively:

The method based on spatio-temporal statistical characteristics is mainly completed by manually extracting statistical features, including the spatio-temporal correlation video frames, the correlation of coding unit depth and prediction mode, the correlation of rate distortion, the correlation of content complexity, etc. [5,6,7,8,9,10,11,12]. According to the RD cost of each coding unit CU and the inter-mode prediction error, Shen et al. [5] used the Bayesian decision method to determine the coding unit CU partition in advance [6]. The gradient-based method reduces the range of candidate modes and uses the statistical characteristics of depth difference and least Hadamard transform-based costs (HAD costs) to make early decisions of the support vector coding unit. The fast quadtree pruning algorithm in [7] uses the prediction of residual statistics to reduce coding time. Lei et al. [8] distinguished between the natural content coding unit CU and the screen content coding unit CU for the extension of intra-prediction coding in screen content coding. Based on the depth and mode content statistical characteristics of the temporal and spatial adjacent coding units, the optimal prediction mode is predicted, and the quad-tree structure is divided to reduce the complexity of intra-frame coding. Q. Hu et al. [9] modeled the mode and CU size selection decision as a binary classification problem and used Neyman–Pearson’s rules to balance the rate-distortion performance loss and the complexity reduction rate. H.S. Kim et al. [10] proposed a fast CU size decision algorithm based on Bayesian decision rules. ** process of CU depth, where the unnecessary coding with the CU (marked with dotted lines) is simply bypassed. Similarly, for the complex content, the large CU sizes can also be skipped without affecting the encoding quality.

Otherwise, one can easily perceive that a large CU-sized CTU is more suitable for a homogeneous region, while a small CU-sized CTU, on the contrary, is more suitable for a region containing a complex object. In these cases, the combination of different CU sizes is expected to be the optimal prediction choice for LCU. In order to verify this observation, the RDO exhaustive mode decision of the HEVC standard reference software was used. By using the different video sequences listed in Table 2, a relatively extensive test experiment was carried out, and the statistical distribution results of the optimal partition structure of each CTU were obtained. The resulting data are shown in Table 2. This study strongly suggests that depth sets be grouped into different complexity classes in terms of the depth range from a Min value to a Max value. The classes that resulted from such depth grou** are established and summarized in Table 3. This study clearly indicates that class C0, C1, C2 with low complexity and class C4, C5, C6 with high complexity are selected as the optimal partition with a high probability. Consequently, only the depths involved in each class are required to be checked.

2.2. Cost Relationship between CU Sizes

Similar to H.264/AVC, the RD optimization technique is performed in HEVC intra-prediction encoding, and the RD costs are computed for all the possible CU sizes to find the minimum RD cost. The RD cost is defined as follows:

J_{R D} = S S E (Q p) + λ_{M o d e} R (Q p)

(1)

where SSE represents the sum of squared difference between the original and its reconstructed CU blocks, respectively, R denotes the total number of bits used for CU signaling, Qp is the quantization parameter, and

λ_{M o d e} = 0.85 \times 2^{(Q p - 12) / 3}

is the Lagrange multiplier.

In HEVC, the signaling of the side information for non-texture data, which includes quadtree split information for TU and CU, prediction modes for PU, filter parameters for the reconstructed PU, etc., becomes more important due to the adoption of many complex coding tools with more coding parameters compared to the conventional methods in previous video coding standards. In order to obtain the relationship between the non-texture cost and the texture cost, the bit rate R of CU was analyzed. The R is composed of:

R = R_{n o n t e x t u r e} + R_{t e x t u r e}

(2)

where

R_{n o n t e x t u r e}

denotes the total number of bits for the non-texture information,

R_{t e x t u r e}

represents the encoding bits for the transform residues which are derived from prediction, transform, quantization, and entropy coding. Substituting (2) into (1), the RDO cost can be represented as follows:

\begin{array}{l} J_{R D} = \underset{J_{R D_{t e x t u r e}}}{\underset{︸}{S S E (Q p) + λ_{M o d e} R_{t e x t u r e} (Q p)}} \\ + \underset{J_{R D_{n o n t e x t u r e}}}{\underset{︸}{λ_{M o d e} R_{n o n t e x t u r e} (Q p)}} \end{array}

(3)

where

J_{R D_{t e x t u r e}}

and

J_{R D_{n o n t e x t u r e}}

are the texture cost and non-texture cost, respectively. The non-texture cost of the CUs has a strong relationship with the depth level of CTUs. As the number of the depth levels increases, more non-texture bits are required to mode the presentation correspondingly. Therefore, a fixed relationship can be determined:

J_{R D_{n o n t e x t u r e, d e p t h_{i}}} < J_{R D_{n o n t e x t u r e, d e p t h_{i + 1}}}

(4)

where

d e p t h_{i}

denotes the depth level

i

of the current CU for

i = 0, 1, 2

.

In the new standards, HEVC has introduced many complicated partition CTU and prediction modes, which has led to significant increases in the non-texture signaling bits. Assuming a certain depth level of a sub-block of a given CTU sample, performing different modes of segmentation under this level of sub-block will obtain different prediction and transformation results, and thus will result in different rate distortion costs. When the current depth level block is further sub-divided into multiple sub-blocks, compared with the current level block, the smaller the block can be described more accurately, and a more accurate prediction and transformation accuracy will be obtained; that is, smaller coding prediction residuals will be obtained. The set of supported sub-divisions for the current level block will produce a smaller bit rate and distortion but will increase the bit rate required to transmit more block prediction parameters. According to the observation, the variance of the residual will decrease proportionally with the growth of the depth level, which results in better prediction, i.e.,

σ_{r e s i d u a l, d e p t h_{i}}^{2} > σ_{r e s i d u a l, d e p t h_{i + 1}}^{2}

. With the assumption of the Laplace distribution of the residual signal [16], we have:

S S E (σ_{r e s i d u a l}^{2} | Q p) \propto σ_{r e s i d u a l}^{2}

and

R_{t e x t u r e} (σ_{r e s i d u a l}^{2} | Q p) \propto σ_{r e s i d u a l}^{2}

. This proportional relation of

SSE

,

R_{texture}

, and

σ_{r e s i d u a l}^{2}

can be observed in Figure 2. It can be observed that both

SSE

and

R_{texture}

are monotonically increasing with

σ_{r e s i d u a l}^{2}

, so their summation should also be a monotonic increasing function.

Therefore, it easily follows that:

S S E (σ_{r e s i d u a l}^{2} | Q p) + λ_{M o d e} R_{t e x t u r e} (σ_{r e s i d u a l}^{2} | Q p) \propto σ_{r e s i d u a l}^{2}

(5)

and

J_{R D_{t e x t u r e, d e p t h_{i}}} > J_{R D_{t e x t u r e, d e p t h_{i + 1}}}

(6)

According to (4) and (6), the relationships of the non-texture rate and texture RD cost across successive depth levels can be obtained:

{\begin{matrix} 0 < R_{n o n t e x t u r e, d e p t h_{i}} < R_{n o n t e x t u r e, d e p t h_{i + 1}} \\ J_{R D_{t e x t u r e, d e p t h_{i}}} > J_{R D_{t e x t u r e, d e p t h_{i + 1}}} > 0 \end{matrix}

(7)

A measure to evaluate the texture complexity of the current CU depth level is introduced, which is the texture cost percentage of the cost term (TCPOC), i.e.,

T C P O C_{d e p t h_{i}} = \frac{J_{R D_{t e x t u r e_{i}}}}{J_{R D_{i}}}

(8)

Here,

T C P O C_{d e p t h_{i}}

can be viewed as an encoding cost weight (

0 < T C P O C_{d e p t h_{i}} < 1

) and can be used to determine whether the CU depth selected by RDO is the best level.

With (1) and (7), it is readily seen that:

\begin{array}{l} \frac{λ_{M o d e} R_{n o n t e x t u r e, d e p t h_{i}} + J_{R D_{t e x t u r e, d e p t h_{i}}}}{J_{R D_{t e x t u r e, d e p t h_{i}}}} \\ < \frac{λ_{M o d e} R_{n o n t e x t u r e, d e p t h_{i + 1}} + J_{R D_{t e x t u r e, d e p t h_{i + 1}}}}{J_{R D_{t e x t u r e, d e p t h_{i + 1}}}} \end{array}

(9)

From (9), a consistent relationship can be derived for

d e p t h_{i} < d e p t h_{i + 1}

and

0 < T C P O C_{d e p t h_{i}} < 1

as follows:

T C P O C_{d e p t h_{i}} \geq T C P O C_{d e p t h_{i + 1}}

(10)

This relationship between

TCPOC

and

depth

can be verified in Figure 3, such as for the Class A sequence “Traffic”. As shown in Figure 3, four quantization parameters “22, 27, 32, 37” were used in the experiment, and the

TCPOC

of all depth levels was obtained through actual coding. The

TCPOC

data corresponding to each quantization parameter were averaged. The more obvious the decrease in

TCPOC

between the levels, the deeper the depth level decreases; for example, the depth level “2” with the quantization parameter “22” is reduced from 0.888 to the depth level “3” 0.761. The monotonically decreasing relationship between

TCPOC

and depth level is shown.

Video Sequences	Qps	Depth Level 0	Depth Level 1	Depth Level 2	Depth Level 3
BQTerrace (1920 × 1080)	22	15.9	15.9	15.1	53
	27	23.8	17.3	20.5	38.4
	32	24	27.1	17.5	31.4
	37	23	22	24.9	30.1
BasketballDrill (832 × 480)	22	0.6	10.1	21.6	67.7
	27	1.1	22.1	31.6	45.2
	32	8.7	33	33.4	24.9
	37	17	38.1	28.8	16.1
Kimono (1920 × 1080)	22	38.3	47	10.3	4.4
	27	25.6	59.6	11.5	3.3
	32	28.7	56.5	12.4	2.4
	37	30.7	56.1	11.5	1.7
Average		19.8	33.7	19.9	26.6

Video Sequences	Qps	Depth Range Category of LCU (%)
Video Sequences	Qps	C0	C1	C2	C3	C4	C5	C6
BQTerrace (1920 × 1080)	22	1.1	1.6	5.6	0.1	38.9	51.2	1.5
	27	3.6	4.5	5.4	0.1	41.5	44.5	0.4
	32	4.8	4.7	8.2	0.2	46.9	35.1	0.0
	37	10.3	6.3	10.4	0.2	49.0	23.8	0.0
BasketballDrill (832 × 480)	22	5.5	18.8	7.5	0.4	42.5	25.1	0.2
	27	8.0	26.7	11.6	0.2	38.8	14.7	0.0
	32	13.5	19.2	19.8	0.4	36.9	10.2	0.0
	37	22.2	19.2	18.0	0.0	32.7	7.8	0.0
Kimono (1920 × 1080)	22	9.2	10.0	24.6	1.7	35.4	19.2	0.0
	27	17.1	20.4	12.1	0.0	32.1	18.3	0.0
	32	31.7	10.4	10.8	0.0	32.1	15.0	0.0
	37	39.2	6.7	14.6	0.0	27.9	11.7	0.0

Video Sequence	Qp	Accuracy (%)
BQTerrace (1920 × 1080)	22	93.4
	27	92.5
	32	92.8
	37	93.6
BasketballDrill (832 × 480)	22	96.1
	27	95.5
	32	95.8
	37	96.2
Kimono (1920 × 1080)	22	98.3
	27	97.8
	32	97.6
	37	98.4
Average		95.67

Video Type	Video Sequences	Classifier 1		Classifier 0		Classifier 2
Video Type	Video Sequences	BDBR (%)	TS (%)	BDBR (%)	TS (%)	BDBR (%)	TS (%)
A	PeopleonStreet	0.78	−25	0.78	−34	0.18	−35
A	Traffic	0.36	−34	0.36	−26	0.08	−41
B	BQTerrace	0.58	−23	0.58	−30	0.09	−23
	Cacus	0.52	−28	0.52	−22	0.21	−45
	Kimono	0.41	−21	0.41	−29	0.26	−50
	ParkScene	0.34	−34	0.34	−32	0.31	−44
C	BasketballDrill	0.65	−26	0.65	−16	0.16	−42
	BQMall	0.37	−22	0.37	−29	0.2	−40
	PartyScene	0.26	−18	0.26	−21	0.81	−55
	RaceHorses	0.21	−13	0.21	−27	0.58	−31
D	BasketballPass	0.24	−17	0.24	−22	0.61	−40
	BlowingBubbles	0.11	−11	0.11	−26	0.25	−41
	Bqsquare	0.21	−15	0.21	−27	0.35	−30
	RaceHorses	0.09	−9	0.09	−30	0.2	−35
E	FourPeople	0.41	−34	0.41	−24	0.2	−43
E	Johnny	0.91	−35	0.91	−27	1.2	−49
Average		0.40	−22	0.49	−26	0.36	−40

Video Type	Video Sequences	Proposed Algorithms		Jamali TB [13]		Liu TB [19]		Xu TIP [21]		Yan CSVT [23]
Video Type	Video Sequences	BDBR (%)	TS (%)	BDBR (%)	TS (%)	BDBR (%)	TS (%)	BDBR (%)	TS (%)	BDBR (%)	TS (%)
A	PeopleonStreet	1.32	−53.2	1.71	−49.4	1.39	−59.2	2.37	−61.0	1.89	−61.3
A	Traffic	1.25	−64.5	1.46	−48.8	1.40	−61.3	2.55	−70.8	1.74	−63.1
B	BQTerrace	1.01	−52.6	0.82	−46.7	1.40	−57.3	1.84	−64.7	1.37	−62.3
	Cacus	0.34	−58.2	1.46	−47.7	0.99	−57.8	2.27	−61.0	1.70	−63.9
	Kimono	0.90	−64.5	1.54	−49.5	0.87	−65.8	2.59	−83.5	0.85	−69.0
	ParkScene	0.52	−52.4	1.02	−47.4	1.08	−57.3	1.96	−67.5	1.70	−63.6
C	BasketballDrill	0.32	−48.1	0.85	−48.7	1.50	−57.4	2.86	−53.0	3.48	−63.8
	BQMall	1.24	−45.7	1.48	−47.1	1.29	−54.9	2.09	−58.4	2.24	−62.3
	PartyScene	1.10	−62.3	1.02	−41.1	0.71	−48.4	0.66	−44.5	1.70	−56.0
	RaceHorses	0.87	−45.9	0.65	−44.6	1.30	−55.9	1.97	−57.1	1.45	−62.4
D	BasketballPass	0.75	−51.2	1.22	−46.5	0.82	−56.7	1.84	−56.4	2.09	−62.9
	BlowingBubbles	0.34	−56.9	1.03	−44.2	0.47	−53.6	0.62	−40.5	2.05	−56.0
	Bqsquare	0.41	−44.5	1.29	−41.0	0.32	−59.1	0.91	−45.8	1.50	−47.9
	RaceHorses	0.32	−43.1	1.22	−46.5	0.51	−52.3	1.32	−55.8	1.65	−57.7
E	FourPeople	0.60	−55.3	1.78	−48.9	1.66	−62.0	3.11	−71.3	2.30	−62.8
E	Johnny	1.90	−53.4	2.22	−49.9	2.16	−69.6	3.82	−70.7	2.61	−69.0
Average		0.82	−53.24	1.30	−46.75	1.12	−58.04	2.05	−60.13	1.90	−61.50

Article Menu

Fast Decision Algorithm of CU Size for HEVC Intra-Prediction Based on a Kernel Fuzzy SVM Classifier

Abstract

1. Introduction

2.2. Cost Relationship between CU Sizes

3. Proposed Fast CU Size Decision Scheme

3.1. Early CU Termination Threshold Selection with TCPOC

3.2. Design of Early Termination Classification Model Based on Fuzzy Support Vector Machine

3.3. Initial Best Depth Prediction

3.4. Overall Algorithm

4. Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI