Next Article in Journal
Underwater Target Detection Algorithm Based on Feature Fusion Enhancement
Next Article in Special Issue
Conditional Encoder-Based Adaptive Deep Image Compression with Classification-Driven Semantic Awareness
Previous Article in Journal
Fabric–Metal Barrier for Low Specific Absorption Rate and Wide-Band Felt Substrate Antenna for Medical and 5G Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Decoupling Source and Semantic Encoding: An Implementation Study

1
State Key Laboratory of Mobile Network and Mobile Multimedia Technology, Shenzhen 518055, China
2
ZTE Corporation, Nanshan District, Shenzhen 518055, China
3
Guangdong Provincial Mobile Terminal Microwave and Millimeter-Wave Antenna Engineering Research Center, College of Electronics and Information Engineering, Shenzhen University, Shenzhen 518060, China
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(13), 2755; https://doi.org/10.3390/electronics12132755
Submission received: 16 May 2023 / Revised: 12 June 2023 / Accepted: 15 June 2023 / Published: 21 June 2023

Abstract

:
Despite the remarkable achievements of modern communication systems based on Shannon’s theory, there is still considerable room for exploration in information transmission capacity, and semantic communication technology has emerged as a promising approach in this regard. Nonetheless, the benefits of semantic communication remain elusive, and the absence of a unified system model has hindered practical implementation. In this context, we contend that semantic communication can benefit from data distortion and the incorporation of natural language modeling information, such that source coding with semantic modeling information does not compromise the performance of semantic communication systems. To fortify our stance, a novel Separated Data-Semantic Coding (SDSC) system is proposed, which disentangles the source coding and semantic coding. Furthermore, relevant experiments are conducted to validate the contention and the SDSC system. By illuminating the superiority of semantic communication, the research not only contributes to the advancement of semantic communication technologies but also facilitates the development of more practical communication systems.

1. Introduction

As 5G technology matures and gradually penetrates the market, people are starting to envision the potential applications of more sophisticated communication systems [1,2]. In order to realize this vision, various wireless communication technologies have been proposed, including terahertz communication [3], air–space–ground integrated networks [4], artificial intelligence (AI)-driven communication [5], inter-protocol communication [6], etc., resulting in a host of new metrics that 6G must satisfy [1,7]. However, the implementation of communication systems based on Shannon’s theory of error-correcting information transmission [8] has essentially hit a bottleneck, which may impede classical communication systems from achieving the state-of-the-art (SOTA) metrics proposed for 6G. As such, expanding Shannon’s theoretical framework while maintaining, or even reducing, the complexity of communication systems and further enhancing their reliability and effectiveness has become an imperative research challenge.
In 1948, Shannon [8] introduced a mathematical framework that elegantly distilled and abstracted the human understanding of communication systems. Nevertheless, on the one hand, he also emphasizes the premise that “These semantic aspects of communication are irrelevant to the engineering problem”. On the other hand, his description relies on fundamental probabilistic and statistical tools to model the intricate sources, channels, and receivers. Therefore, Shannon’s model, while an ideal abstraction that effectively encapsulates the core essence of diverse communication systems, falls short of furnishing a comprehensive characterization of the intricate and multifaceted communication challenges we encounter in practice [9].
The limitations of Shannon’s theory have bequeathed numerous opportunities for subsequent researchers to refine and extend. Weaver [10] redefined the concept of communication and expounded on three levels:
  • LEVEL A. How accurately can the symbols of communication be transmitted? (The technical problem.)
  • LEVEL B. How precisely do the transmitted symbols convey the desired meaning? (The semantic problem.)
  • LEVEL C. How effectively does the received meaning affect conduct in the desired way? (The effectiveness problem.)
While the initial problem has been exquisitely addressed within information theory, the latter difficulty gave rise to the notion of semantic communication. However, Weaver does not provide a methodology for constructing a semantic communication system. Carnap and Bar-Hillel [11] replace statistical probability with logical probability to describe the semantic information based on propositions. In other words, the higher the logical probability that a sentence is true, the lower its semantic information content. Subsequently, Floridi [12,13], D’Alfonso [14], Bao [15], Kolchinsky [16], and others have developed semantic measurement systems from various natural language properties, including truth, fuzziness, randomness, and physical properties. Nonetheless, owing to the absence of statistical tools and the complexity of natural language [9], these measurements are inadequate to generalize and accurately portray semantics in natural language.
In recent times, the progression of AI has brought about major changes in the storage and utilization of information. Notably, a succession of large language models (LLMs), such as ChatGPT (https://openai.com/blog/chatgpt accessed on 1 March 2023), have been revolutionizing traditional applications in text generation, language translation, and question-answering [17,18]. These LLMs have ushered in a new era of information technology, prompting a corresponding shift in the way information is transmitted [19,20]. As the most potent statistical tool available to humans presently, artificial neural networks (ANNs) have proven to be effective in modeling natural language and syntax logic [19], with numerous semantic communication systems being developed and implemented. In fact, the inception of semantic communication systems can be traced back to the implementation of the sophisticated joint source–channel coding (JSCC) systems for texts [21], images [22,23], and speech [24]. Additionally, the underlying gains are understood to come primarily from joint coding techniques. Subsequently, an awareness has emerged among researchers, recognizing the pivotal role of semantics within these systems, thereby engendering it as a vital source of system gains. Additionally, transformers [25] have been introduced into system implementation. This realization paved the way for an array of ground-breaking semantic communication systems, exemplified by the pioneering DeepSC [26]. Notably, in scenarios characterized by low signal-to-noise ratios where conventional communication systems exhibit subpar performance, the DeepSC system continues to exhibit commendable transmission capabilities. Building upon this success, subsequent advancements led to the advent of the other semantic communication systems [27,28,29]. Moreover, the NTSCC system [30], initially designed for image transmission, acts as the foundation for the development of an extensive gamut of multimedia data transmission systems [31,32,33]. Presently, cutting-edge research focuses on semantic communication systems for video transmission [34,35], demonstrating their efficacy in real-time video conferencing scenarios.
These state-of-the-art systems boast significantly enhanced data transmission capacities in comparison to traditional communication methods, with concurrent advancements in the metrics employed to evaluate human perceptual experiences in the realm of video. Nevertheless, the theoretical explanation of these systems is not yet perfect. As AI is an enigmatic black box, the origin of semantic communication gains cannot be clearly explained. In addition, these systems primarily adopt JSCC, and most of them can only operate in laboratory environments, rendering them incompatible with the separated communication paradigm utilized in practical applications [19,36].
This article expounds upon the aforementioned issue and provides a theoretical exposition of the potential origination of gains for the semantic communication system. Additionally, it posits that the source coding that preserves the semantic modeling information can be decoupled from semantic coding without compromising the benefits. The article also proposes a novel model, the Separated Data-Semantic Coding (SDSC) system, that allows for classical source coding, such as Huffman coding, arithmetic coding, and Lempel–Ziv coding, as input. Additionally, experimental investigations are conducted to confirm its practical feasibility. Furthermore, the experimental results elucidate the origin of the semantic communication advantages and validate the possibility of dissociating source coding from semantic coding. The approach can facilitate the implementation of semantic communication within practical communication frameworks [37,38].
The remainder of the paper is structured in a coherent manner. In the second section, the disparities between classical communication and semantic communication are demarcated, and the origins of gains in the semantic communication system are thoroughly analyzed. Additionally, the feasibility of disentangling semantic coding from source coding is underscored. Moving on to the third section, it expounds upon the SDSC devised and delineates the experimental methodology employed to validate the conclusions drawn in the second section. Then, the experimental results are interwoven with theoretical analysis to facilitate informed discussions in the fourth section. Moreover, the fifth section elucidates the importance, limitations, and future research directions of the experiment. Finally, the article culminates by summarizing the findings in a concise manner.

2. Methodology

This section begins with a comprehensive review of the error-free transmission theory. Additionally, it points out that the separation theorem fails to hold in the context of source coding that utilizes rate-distortion processes. Subsequently, semantic communication is compared to classical communication, elucidating how the former constitutes a rate-distortion process that conserves semantic, and yields benefits not only from permitting data distortion but also from incorporating sophisticated semantic modeling information of the source. Finally, it is concluded that in cases of source coding and semantic coding separation, the source coding must be capable of retaining the semantic modeling information embedded in the original source.

2.1. Classical Communication System

The classical communication system is based on the source coding theorem, channel coding theorem, and source–channel coding theorem [8]. As shown in Figure 1, M denotes the source, P ( M ) denotes the source distribution, and R s denotes the source coding rate. Assuming that the source symbols belong to the typical set and disregarding their specific meaning, the source coding theorem can be derived. With R s > H ( M ) , there is a lossless source code which can be denoted by s with distribution P ( s ) .
For a discrete memoryless channel { X , p ( y | x ) , Y } with capacity given by C , where X denotes the channel input and Y denotes the channel output, the sampling of X and Y are x and y , respectively, so the conditional probability p ( y | x ) is the channel transition probability. When X and Y form a jointly typical set, it can be proven that for any channel coding rate R C < C , there exists a sequence of codewords ( 2 n R C , n ) , when n , achieving the probability of error P e ( n ) 0 . Additionally, there is a source–channel coding theorem which states that if M ^ denotes the message received by the destination, and H ( M ) < C , then there exists a source–channel code such that the probability of error in message transmission is P ( M ^ M ) 0 .
Due to the independence of the source coding theorem and channel coding theorem, it is known from the source–channel coding theorem that when R S < R C , separately using source coding and channel coding can achieve error-free transmission performance as JSCC. The role of source coding is to establish one-to-one map**, making the distribution of codewords P ( s ) as similar as possible to that of the original source P ( M ) , in order to achieve the maximum compression ratio. The role of channel coding is to transform the source codewords by adding redundant bits, while kee** the source entropy H ( M ) unchanged, in order to approach the channel capacity. Therefore, the separately implemented source coding and channel coding can be described as a process of harmonizing P ( s ) and P ( y | x ) at the cost of sending additional energy for redundant bits. Since separated implementation and separated optimization are easier, separated source–channel coding has become a paradigm for communication system implementation in practice.
In practical implementation, source coding often utilizes lossy compression to reduce the amount of transmitted data and eliminate superfluous information, hence following a rate-distortion function rather than the source coding theorem. In cases where the source coding theorem is not applicable, the source–channel coding theorem for the original source M also becomes invalid, thereby rendering the separation theorem inapplicable as well. Similarly, perfect channel coding is unattainable, because it is difficult to describe the actual channel with simple statistics. Therefore, in practical communication systems, source coding and channel coding need to balance distribution and distortion for optimal performance. In the rate-distortion process, the source coding not only needs to obtain code words s with the minimum coding length, but also needs to make the source distribution P ( M ) match the channel P ( y | x ) as much as possible while ensuring the distortion D . For a fixed channel, the channel coding not only needs to achieve the channel capacity, but also needs to share a portion of the distortion D to match the channel P ( y | x ) as much as possible, so as to save redundant bits. Thus, for a fixed distortion D , the optimal coding scheme can be obtained. However, at this point, the source coding consults the channel P ( y | x ) , and the channel coding also consults the distortion of the source coding.
In fact, it has been proven that an optimal JSCC coding scheme exists when the distortion function conforms to a certain form [39]. In another word, for the JSCC coding, if it enables the source distribution P ( M ) to match the actual channel P ( y | x ) , and enables the channel to match the distortion D , then it is optimal. If the JSCC is employed in the actual system implementation and meets the above conditions, gain can be obtained from the rate-distortion process.

2.2. Semantic Communication System

The semantic communication system, as shown in Figure 2, aims to minimize data transmission while preserving semantics. Therefore, from the perspective of classical communication, semantic communication is a process that allows for data distortion while retaining semantics. Additionally, since semantic communication requires modeling natural language, the natural language model serves as the knowledge base or priori information. The additional information obtained through reasoning from the natural language model based on the transmitted information is equivalent to the information transmitted through the semantic channel. Hence, semantic communication achieves data transmission with distortion, yet ultimately preserves the semantic information by establishing a knowledge base. It is worth noting that there is currently no unified definition of a knowledge base. In this paper, a knowledge base or semantic modeling information refers to the parameters obtained through the training process of an ANN.
Nowadays, implementations of semantic communication rely on the JSCC based on ANNs. In this context, let M represent the original source message, S represent the semantics of the message M , and f represent the semantic encoder, which is a joint source–channel encoder. The semantic encoder extracts the semantic features X from the source with nature language modeling information K . Then, the semantic features X are transmitted through a channel and received as Y at the destination. The semantic decoder utilizes the common semantic modeling information K and the received semantic features Y to recover the message M ^ and its corresponding semantics S ^ .
In the process of semantic communication, the goal is not to make the received message M ^ necessarily the same as the transmitted message M , but with the preservation of semantics, i.e., S ^ = S . However, in practical implementations, there may be some semantic distortion, which does not affect the correct inference of semantics with the natural language model. The distortion can be denoted as D S = E [ d S ( S ; S ^ ) ] with the semantic distortion function d S ( S ; S ^ ) , and E denotes the mean in the semantic space. The corresponding data distortion can be denoted as D D . The maximum allowable semantic distortion can be defined as D S max , and the data distortion allowed by the system is defined as D S D = max D S D S max D D . Therefore, semantic communication belongs to lossy communication, with a distortion of D S D . However, as this paper focuses primarily on the gains of semantic communication, the specific form of the aforementioned function is beyond the scope of this discussion.
Compared to classical communication, semantic communication offers advantages primarily in the context of lossy communication, since it permits data distortion and thus allows for a higher coding rate. Moreover, leveraging the power of AI, the encoder and decoder can be automatically adjusted during training to match the source distribution to the channel, as well as to account for source distortion, ultimately achieving optimal performance [22,23]. In addition to these benefits, semantic communication also gains from the nonlinear information present in natural language, such as semantic associations between words and the grammar of sentences [40,41], which are captured by the semantic encoder and decoder to establish a knowledge base. Therefore, the advantages of semantic communication over JSCC lossy communication stem from both the priori information provided by the knowledge base and the additional information that can be inferred from it. In other words, for a given semantic distortion, the amount of semantic feature information required by semantic communication systems nonlinearly decreases as the shared semantic modeling information between the source and the destination increases, thereby further enhancing the overall capacity of the communication system.
Furthermore, the establishment of semantic communication should not be regarded as a complete deviation from the conventional communication paradigm. The decoupling implementation still holds as a viable option, and therefore it is crucial to understand the requirements for the separation implementation. This necessitates an exhaustive analysis of the gains obtained from the semantic communication system. On the one hand, semantic encoding can be viewed as a JSCC mechanism that can adjust different distortions according to the distinct source distributions and the various channels, thus not only enabling the source compression while adapting to the distortion function, but also enabling the distortion to match the channel. At this point, the lossless source encoding can be seamlessly integrated with the semantic encoding without compromising the system’s performance. On the other hand, the gain from semantic communication originates from the natural language modeling information. The coded bits are also a manifestation of natural language, and when the source encoding retains the semantic modeling information, the natural language conveyed by the source encoding can be modeled. The semantic decoder, leveraging the natural language model, can make rational inferences about semantic information, even when the data are subject to a certain range of distortion. In other words, it allows for a partial distortion of the source encoding. Additionally, the distortion can be eliminated by the semantic inferences, so it does not affect the adaptability of the semantic coding to distribution and distortion, thereby engendering further gains and improving the capacity of the communication system.
Our research introduces a novel SDSC system that enables semantic communication using lossless or partially lossy source encoding as input. Our simulation results corroborate the analysis, demonstrating that the function of natural language modeling information contained in the source encoding in a semantic communication system.

3. Separated Data-Semantic Coding System and Experiments

The present section begins with an introductory discourse concerning the proposed SDSC, which is subsequently followed by a meticulously devised experimental plan and implementation procedure. The experimental outcomes effectively substantiate the theoretical analysis presented in Section 2, wherein the gain of the SDSC system is obtained from the intricate modeling of natural language. Moreover, it is noteworthy that the semantic communication system can be detached from the source encoding containing the information on semantic modeling.

3.1. Separated Data-Semantic Coding System

The architecture of the SDSC is predicated on the Transformer [25]. The system comprises three modules, namely the source encoding module (depicted as Data Encoder in Figure 3), source encoding transformation module (depicted as Data Transfer in Figure 3), and semantic coding module (depicted as Semantic Encoder and Decoder in Figure 3). The source encoding module can adopt various encoding algorithms, such as Huffman coding, arithmetic coding, or Lempel–Ziv coding. The source encoding transformation module is conducted by a series of fully connected networks. The semantic encoding module is designed based on the Transformer. The comprehensive structure of the system is succinctly depicted in Figure 3.
We assume that the message M to be transmitted is encoded as s by the source encoding module, which preserves the semantic modeling information of the original message since lossless source coding is a one-to-one map** that preserves the order. It should be noted that the code words represent a change in state rather than a probability, while both the input and output of the ANN represent probabilities. Therefore, the 0/1 bit is replaced in the source coding with the probability of their occurrence, such that bit 0 occurs with probability P 0 and bit 1 occurs with probability P 1 , with P 0 + P 1 = 1 . Then, bit 1 and 0 are replaced by probability P 0 and P 1 , respectively. In this way, the bit information can be converted into probability information while balancing the impact of 0/1 bit as inputs. In addition, since the code words may be of unequal length, they are padded with 0 to become equal length. During network training, the zero-padding does not affect the parameter training. Thus, the actual code words that participate in the semantic encoding process are still the original unequal-length code words. After being processed by the source encoding transformation module, an equal-length code with the size of the corpus is obtained, which is then fed into the semantic encoding module to generate the semantic feature X . The X is transmitted through the channel and received as Y . Finally, the semantic decoder is used to recover the original message as M ^ .
The source encoding transformation module can be implemented in various ways. One method is to employ a set of fully connected networks to convert unequal-length coding into semantically modeled coding, followed by reparameterization trick to convert the coding into a one-hot vector or a vector with multiple elements as 1, which serves as input for the semantic encoder. However, this approach poses challenges in terms of increased module size and network training complexity. Another approach involves using a single fully connected network to implement the transformation module, which necessitates padding operations for unequal-length inputs to enable their usage as input to a fully connected network, thereby obviating the need for reparameterization when inputting the semantic encoding module. The embedding layer in the semantic encoder can be substituted with a fully connected layer to maximize information transmission. The third method entails the utilization of a CNN to implement the transformation module for arithmetic coding, Lempel–Ziv coding, and other source coding algorithms, which do not encode the source character by character, resulting in one less dimension than Huffman coding. Given that the second approach offers a reasonable network size and computational complexity, it is adopted in the experiment to implement the source encoding transformation module for Huffman coding. The Huffman coding assigns a unique code to each source symbol, ensuring that the statistical properties of the source messages and their corresponding Huffman code words are identical. Hence, if numbers can be viewed as a form of natural language, Huffman coding can be viewed as a lossless encoding scheme that effectively preserves semantic modeling information.

3.2. Experiments

In order to demonstrate the existence of semantic modeling information in lossless source coding, the distribution of unigrams (single word), bigrams (word pairs), and trigrams (word triplets) in the messages were analyzed. The distribution reflects the structured information present in natural language, which can be regarded as a part of semantic modeling information. However, it should be noted that some of the modeling information in natural language may require complex nonlinear transformations to be fully captured [41].
To further investigate the ability of source coding with modeling information to be decoupled from semantic encoding, four sets of experiments were designed for comparative analysis.
  • In the first experiment, the semantic codec was trained with original messages and subsequently tested on the testing messages in their natural language order.
  • In the second experiment, Huffman coding was applied to the original messages, and the resulting code words were used to train the SDSC system while preserving the natural language order. The testing data were then coded in the same way to evaluate system performance.
  • In the third experiment, the SDSC system was trained with shuffled Huffman coding, whereby the characters in each sentence of the training data were randomly rearranged, and the corresponding labels were also shuffled accordingly to prevent the SDSC system from learning natural language modeling information. During testing, Huffman codes were input in the correct order to assess system performance.
  • In the fourth experiment, the SDSC system was trained using data encoded with lossy Huffman coding, and the system was subsequently used to transmit the lossy code words. The lossy coding scheme employed Huffman coding that was truncated to a fixed length, with codes shorter than the fixed length retained in their original length.
The SDSC parameters, as described in Table 1, were utilized in the experiment. Details regarding the data and training methods can be found in Appendix A and Appendix B. It is worth noting that the Bilingual Evaluation Understudy (BLEU) metric was employed in the experiments as a measure of transmission effectiveness. BLEU was originally proposed as a text translation metric [42], which is analogous to semantic communication (translating the original message into a semantically equivalent but more compact form). Compared to bit error rate, BLEU is more aligned with human understanding of natural language.

4. Results

As shown in Figure 4, we demonstrated the occurrence frequency of unigrams, bigrams, and trigrams within a given corpus. Each item is assigned a unique ordinal number, corresponding to its frequency of occurrence. The most frequently occurring item is given the ordinal number 1, while subsequent items are consecutively numbered.
Comparing Figure 4a with Figure 4b, it can be observed that the statistical results for unigrams are consistent. The entropy for unigrams of the message is 9.2175 bits. The entropy of ordered bigrams and trigrams are 14.5418 bits and 17.4228 bits, while the entropy of shuffled bigrams and trigrams are 17.3172 bits and 19.9076 bits. This indicates that there are about 2.8 bits and 2.5 bits of information embedded in the natural language order of bigrams and trigrams, respectively, which can help semantic codec for encoding and decoding.
Nevertheless, the statistical approach only provides a rough approximation of the structured information present in natural language and does not account for the complexities involved in language modeling. It simply groups different combinations of symbols together as word groups, forming the basic unit of natural language. The method does not consider the semantic relationships between individual unigrams or the grammatical relationships within a complete sentence, while they may be captured by the semantic codec. Therefore, the actual gains achieved by the semantic communication system may exceed the numerical results of the above analysis, and subsequent experiments have also demonstrated this.
As depicted in Figure 5, the performance of the blue line and the red line is very similar. The red line incurs some performance loss, which may be attributed to the insufficient complexity of the utilized network, resulting in limited learning capabilities for codes. It indicates that the SDSC can effectively separate source coding and semantic coding by converting classical code words with the semantic modeling information into semantic features for transmission. This capability allows for practical integration of semantic encoding into communication system frameworks. In addition, the yellow line has a significant degradation in performance when compared to the red line, suggesting that preserving semantic modeling information is essential for successfully transforming the classical code words to semantic features. This confirms that a portion of the benefits of semantic communication systems stems from the natural language modeling process.
Furthermore, the study tested lossy Huffman coding with truncation lengths of 6 bits, 9 bits, and 12 bits, as detailed in Appendix B. It is worth noting that some truncated codewords do not have corresponding Chinese characters and are decoded as a special symbol <UNK>. If more bits were truncated, more <UNK> symbols would be generated, potentially leading to the destruction of structured information from the original message. Consequently, a network trained with such data could potentially exhibit poorer performance.
Figure 6 illustrates that the lossy source coding can lead to a decline in the performance of semantic communication systems. The extent of this decline is determined by the amount of semantic modeling information that is lost during the process of lossy source coding, rather than the amount of data that is lost. When lossy coding still retains the majority of semantic modeling information, such as the Huffman code truncated to 12 bits shown in the figure, the performance loss in semantic communication systems is small, while it may cause significant errors in classical communication systems, such as the cliff effect. This is mainly because the semantic encoder and decoder can recover the undecodable code words based on the semantic modeling information and then infer them into reasonable sentences. However, when lossy coding causes the majority of semantic modeling information to be lost, such as the Huffman code truncated to 9 bits and 6 bits shown in the figure, it can result in a performance degradation, even a complete breakdown of semantic communication systems. The above results indicate that when the source is subjected to lossy coding and the degree of data distortion can nonlinearly affect semantic modeling information and its inferences, it will affect the performance of semantic communication systems and reduce the gain obtained from natural language models.

5. Discussion

The present discourse posits the advantages of semantic communication vis à vis classical communication systems under rate-distortion conditions. It is further contended that semantic communication systems can reap significant gains through semantic modeling information and its inferences. Assuming the veracity of the aforementioned analysis, the conclusion is drawn that source coding with loss can be achieved separately from semantic communication systems while retaining sufficient semantic modeling information. Additionally, the SDSC model and related experiments are designed to verify the extrapolation.
The experiment utilizes the original messages to train a semantic codec, which yields optimal transmission performance. Subsequently, Huffman coding is applied to the source and the resulting code words are used to train the SDSC system. It is found that training the SDSC system with nature language order codes lead to a similar performance to the semantic codec direct transmission, but shuffled code words lead to a significant drop in transmission performance. This result not only demonstrates the effectiveness of the SDSC system, which can separate source coding and semantic coding, but also corroborates the hypothesis that the gains of semantic communication systems emanate from natural language modeling information embedded in the source. Furthermore, the truncation length of 6-bit, 9-bit and 12-bit codes are determined to train the SDSC system, with the assistance of an analysis of code word length. The experimental results show that even in the case of lossy data, retaining a certain degree of semantic modeling information enables it to perform semantic inferences, so as to achieve the minimal performance loss. In summary, the series of experiments not only highlights the source of gains in semantic communication systems but also provides a foundation for the cross-data and cross-protocol capabilities of the future semantic communication.
Notwithstanding the findings of the present study, further extensive research is warranted to unravel the intricacies of semantic communication systems. Firstly, it is worth noting that BLEU may not be a comprehensive metric to gauge the semantic transmission capacity, as its scope is limited and it cannot employed as the sole semantic distortion function in such systems. Future research on semantic distortion functions may necessitate a conflation of loss functions in ANNs with communication principles to yield further improvements. Secondly, different data formats and lossless source coding experiments are imperative to investigate the correlation between data distortion and semantic distortion. Different relationships between data distortion and semantic distortion may occur in different data formats and coding algorithms. This could also facilitate the evaluation of the semantic communication system’s quality and even aid in resolving the first issue of devising a semantic distortion function. Lastly, the knowledge base serves as a crucial component that connects the semantic encoder and decoder. Currently, the knowledge base is considered to be the parameters of the ANN as well as the parameters of the natural language model. It is obtained through the training process of the ANN, which is intricately related to both the channel and the source. To further investigate the essence of semantic communication, it is necessary to theoretically elucidate the physical significance of ANN parameters. This may be related to information bottleneck theory, although there is currently no definitive consensus.

6. Conclusions

This article critically dissects the factors responsible for the improved efficacy of semantic communication systems. It presents an innovative separated data coding and semantic coding system to substantiate the theoretical analysis. Experimental results unequivocally validate the premise that a substantial portion of the gains in semantic communication systems emanate from the modeling information of natural language and its inferences. Furthermore, when the source coding preserves most of the semantic modeling information, source coding and semantic coding can be separated and implemented independently. These analytical insights could prove instrumental in the practical implementation of semantic communication systems. Additionally, the propounded SDSC system offers a viable paradigm for the pragmatic application of semantic communication systems within the ambit of the classical communication framework.

Author Contributions

Conceptualization, Y.F.; methodology, Y.F. and J.X.; software, Y.F.; validation, Y.F. and J.X.; formal analysis, Y.F. and C.L.; investigation, Y.F.; resources, Y.F.; data curation, Y.F.; writing—original draft preparation, Y.F.; writing—review and editing, Y.F., J.X. and C.L.; visualization, Y.F.; supervision, J.X., L.H. and G.Y.; project administration, G.Y. and T.Y.; funding acquisition, L.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by National Key Research and Development Program of China (Grant Number: 2020YFB1807202).

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: https://github.com/brightmart/nlp_chinese_corpus (accessed on 1 March 2023).

Acknowledgments

I would like to thank my supervisor ** Xu for his guidance through each stage of the process. I also would like to acknowledge Guanghui Yu for inspiring my interest in the semantic communication. I would particularly like to acknowledge my team leader Liujun Hu for his patient support.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The data used in the experiment are a Chinese language corpus, which is known to be more difficult to model compared to English. The dataset used was the IFLYTEK long text classification dataset, consisting of over 17,000 annotated long text data about various app descriptions related to daily life, spanning across 119 categories, including “taxi”, “map navigation”, “car rental”, “free WIFI”, etc. To train the communication system, the data were preprocessed, including removing the classification labels, format conversion, sentence splitting, and so on. It should be noted that, for ease of system training, sentences with less than 4 Chinese characters were discarded, and sentences longer than 50 characters are truncated to 50. The obtained dataset contains 37,505 Chinese sentences, with a vocabulary of 3780 characters. Additionally, 90% of the data, or 33,754 sentences, were used for training, while 10% of the data, or 3751 sentences, were used for testing.
The “Adma optimizer” was used in the training phase, the learning rate was set to 5 × 10 4 , the epoch was set to 100, the batch size was set to 64, and the training was carried out on a Tesla V100 GPU. In the test phase, we let the system run separately under different channel SNRs, and the greedy search method was employed to decode. Bilingual Evaluation Understudy (BLEU) was utilized to measure the effect, which is used to measure the translation effect in natural language processing (NLP) and is often used to measure the text transmission effect in semantic communication systems. BLEU can be calculated as follows:
BLEU = B P exp ( n = 1 N ω n log P n )
B P = { 1                   i f   c > r e 1 r / c         i f   c r
where the B P is the length penalty factor, r is the length of the target sentence, c is the length of the sentence to be transmitted, ω n is the weight of n-grams, and P n is the probability that n-grams in the transmitted information appear in the received information. It can be calculated as:
p n = k min ( C k ( M ^ ) , C k ( M ) ) k min ( C k ( M ^ ) )
where C k represents the occurrence frequency of the k th n-grams.

Appendix B

To ascertain that data distortion does not significantly impair the performance of semantic communication systems, but that semantic distortion resulting from the loss of semantic modeling information can indeed affect the performance of semantic communication systems, truncated Huffman code words were employed to obtain distorted data.
To determine the length to which Huffman encoding should be truncated, we conducted a statistical analysis of all Huffman codewords obtained from encoding the corpus, as shown in Figure A1. It can be seen that the lengths of Huffman codewords were mainly concentrated between 9 bits and 12 bits. Therefore, the length of Huffman encoding can be truncated to either 9 bits or 12 bits. To further illustrate the point in the paper, experiments were also conducted with 6-bit codes. It should be noted that the 6-bit codes were mainly compared with the shuffled codes. Since the label cannot be truncated during training, the 6-bit codes corresponded to the correct label during training, and a part of the natural language modeling information was also captured. The results were also better than those of shuffled codes training.
Figure A1. The distribution of code words lengths resulting from Huffman coding applied to the given corpus. The horizontal axis represents the codeword lengths, while the vertical axis shows the corresponding number of codewords of that length.
Figure A1. The distribution of code words lengths resulting from Huffman coding applied to the given corpus. The horizontal axis represents the codeword lengths, while the vertical axis shows the corresponding number of codewords of that length.
Electronics 12 02755 g0a1
When truncated to 6 bits and 9 bits, most codewords lost their translatability and were decoded as <UNK>, resulting in a large number of <UNK> in the sentence, which disrupts the natural language structure. When truncated to 12 bits, only a small number of infrequent code words were decoded as <UNK>. Although compression increased significantly compared to lossless Huffman encoding due to truncation, most of the natural language structure was preserved and most of the semantic modeling information remained intact in 12-bit codes. It is worth noting that when truncated to 6 bits, 9 bits, and 12 bits, the total lengths of the retained codes were 7.63%, 24.43%, and 79.7%, respectively.

References

  1. Chowdhury, M.Z.; Shahjalal, M.; Ahmed, S.; Jang, Y.M. 6G Wireless Communication Systems: Applications, Requirements, Technologies, Challenges, and Research Directions. IEEE Open J. Commun. Soc. 2020, 1, 957–975. [Google Scholar] [CrossRef]
  2. Jiang, W.; Han, B.; Habibi, M.A.; Schotten, H.D. The Road Towards 6G: A Comprehensive Survey. IEEE Open J. Commun. Soc. 2021, 2, 334–366. [Google Scholar] [CrossRef]
  3. Chaccour, C.; Soorki, M.N.; Saad, W.; Bennis, M.; Popovski, P.; Debbah, M. Seven Defining Features of Terahertz (THz) Wireless Systems: A Fellowship of Communication and Sensing. IEEE Commun. Surv. Tutor. 2022, 24, 967–993. [Google Scholar] [CrossRef]
  4. Cui, H.; Zhang, J.; Geng, Y.; **ao, Z.; Sun, T.; Zhang, N.; Liu, J.; Wu, Q.; Cao, X. Space-air-ground integrated network (SAGIN) for 6G: Requirements, architecture and challenges. China Commun. 2022, 19, 90–108. [Google Scholar] [CrossRef]
  5. Strinati, E.C.; Barbarossa, S.; Gonzalez-Jimenez, J.L.; Ktenas, D.; Cassiau, N.; Maret, L.; Dehos, C. 6G: The Next Frontier: From Holographic Messaging to Artificial Intelligence Using Subterahertz and Visible Light Communication. IEEE Veh. Technol. Mag. 2019, 14, 42–50. [Google Scholar] [CrossRef]
  6. Zhao, Y.; Yu, G.; Xu, H. 6G mobile communication networks: Vision, challenges, and key technologies. Sci. Sin. Inf. 2019, 49, 963–987. [Google Scholar] [CrossRef] [Green Version]
  7. Zhang, P.; Xu, W.; Gao, H.; Niu, K.; Xu, X.; Qin, X.; Yuan, C.; Qin, Z.; Zhao, H.; Wei, J.; et al. Toward Wisdom-Evolutionary and Primitive-Concise 6G: A New Paradigm of Semantic Communication Networks. Engineering 2019, 8, 60–73. [Google Scholar] [CrossRef]
  8. Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
  9. Wang, Y.; Li, H. Information Theory and Coding Theory; Higher Education Press: Bei**g, China, 2013; p. 338. [Google Scholar]
  10. Weaver, W. Recent contributions to the mathematical theory of communication. ETC A Rev. Gen. Semant. 1953, 10, 136–157. [Google Scholar]
  11. Bar-Hillel, Y.; Carnap, R. Semantic Information. Br. J. Philos. Sci. 1953, 14, 147–157. [Google Scholar] [CrossRef]
  12. Floridi, L. Outline of a Theory of Strongly Semantic Information. Minds Mach. 2004, 14, 197–221. [Google Scholar] [CrossRef] [Green Version]
  13. Floridi, L. Is Semantic Information Meaningful Data? Philos. Phenomenol. Res. 2005, 70, 351–370. [Google Scholar] [CrossRef] [Green Version]
  14. D’Alfonso, S. On Quantifying Semantic Information. Information 2011, 2, 61–101. [Google Scholar] [CrossRef] [Green Version]
  15. Bao, J.; Basu, P.; Dean, M.; Partridge, C.; Swami, A.; Leland, W.; Hendler, J.A. Towards a theory of semantic communication. In Proceedings of the 2011 IEEE Network Science Workshop, West Point, NY, USA, 22–24 June 2011; pp. 110–117. [Google Scholar] [CrossRef]
  16. Kolchinsky, A.; Wolpert, D.H. Semantic information, autonomous agency and non-equilibrium statistical physics. Interface Focus 2018, 8, 20180041. [Google Scholar] [CrossRef]
  17. Lin, T.; Wang, Y.; Liu, X.; Qiu, X. A survey of transformers. AI Open 2022, 3, 111–132. [Google Scholar] [CrossRef]
  18. Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large Language Models. ar**v 2023, ar**v:2303.18223. [Google Scholar]
  19. Lu, Z.; Li, R.; Lu, K.; Chen, X.; Hossain, E.; Zhao, Z.; Zhang, H. Semantics-Empowered Communication: A Tutorial-cum-Survey. ar**v 2022, ar**v:2212.08487. [Google Scholar]
  20. Kountouris, M.; Pappas, N. Semantics-Empowered Communication for Networked Intelligent Systems. IEEE Commun. Mag. 2021, 59, 96–102. [Google Scholar] [CrossRef]
  21. Farsad, N.; Rao, M.; Goldsmith, A. Deep Learning for Joint Source-Channel Coding of Text. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 2326–2330. [Google Scholar] [CrossRef] [Green Version]
  22. Bourtsoulatze, E.; Kurka, D.B.; Gunduz, D. Deep Joint Source-Channel Coding for Wireless Image Transmission. IEEE Trans. Cogn. Commun. Netw. 2019, 5, 567–579. [Google Scholar] [CrossRef] [Green Version]
  23. Kurka, D.B.; Gündüz, D. Deep Joint Source-Channel Coding of Images with Feedback. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 5235–5239. [Google Scholar] [CrossRef]
  24. Kankanahalli, S. End-To-End Optimized Speech Coding with Deep Neural Networks. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 2521–2525. [Google Scholar] [CrossRef] [Green Version]
  25. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. ar**v 2017, ar**v:1706.03762. [Google Scholar]
  26. **e, H.; Qin, Z.; Li, G.Y.; Juang, B.-H. Deep Learning Enabled Semantic Communication Systems. IEEE Trans. Signal Process. 2021, 69, 2663–2675. [Google Scholar] [CrossRef]
  27. Zhou, Q.; Li, R.; Zhao, Z.; Peng, C.; Zhang, H. Semantic Communication with Adaptive Universal Transformer. IEEE Wirel. Commun. Lett. 2022, 11, 453–457. [Google Scholar] [CrossRef]
  28. Weng, Z.; Qin, Z. Semantic Communication Systems for Speech Transmission. IEEE J. Sel. Areas Commun. 2021, 39, 2434–2444. [Google Scholar] [CrossRef]
  29. Weng, Z.; Qin, Z.; Tao, X.; Pan, C.; Liu, G.; Li, G.Y. Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis. IEEE Trans. Wirel. Commun. 2023, 1. [Google Scholar] [CrossRef]
  30. Dai, J.; Wang, S.; Tan, K.; Si, Z.; Qin, X.; Niu, K.; Zhang, P. Nonlinear Transform Source-Channel Coding for Semantic Communications. IEEE J. Sel. Areas Commun. 2022, 40, 2300–2316. [Google Scholar] [CrossRef]
  31. Yao, S.; Niu, K.; Wang, S.; Dai, J. Semantic Coding for Text Transmission: An Iterative Design. IEEE Trans. Cogn. Commun. Netw. 2022, 8, 1594–1603. [Google Scholar] [CrossRef]
  32. Yang, K.; Wang, S.; Dai, J.; Tan, K.; Niu, K.; Zhang, P. WITT: A Wireless Image Transmission Transformer for Semantic Communications. ar**v 2022, ar**v:2211.00937. [Google Scholar]
  33. **ao, Z.; Yao, S.; Dai, J.; Wang, S.; Niu, K.; Zhang, P. Wireless Deep Speech Semantic Transmission. ar**v 2022, ar**v:2211.02283. [Google Scholar]
  34. Jiang, P.; Wen, C.-K.; **, S.; Li, G.Y. Wireless Semantic Communications for Video Conferencing. IEEE J. Sel. Areas Commun. 2023, 41, 230–244. [Google Scholar] [CrossRef]
  35. Wang, S.; Dai, J.; Liang, Z.; Niu, K.; Si, Z.; Dong, C.; Qin, X.; Zhang, P. Wireless Deep Video Semantic Transmission. IEEE J. Sel. Areas Commun. 2022, 41, 214–229. [Google Scholar] [CrossRef]
  36. Liu, D.; Sun, F.; Wang, W.; Dev, K. Distributed Computation Offloading with Low Latency for Artificial Intelligence in Vehicular Networking. IEEE Commun. Stand. Mag. 2023, 7, 74–80. [Google Scholar] [CrossRef]
  37. Han, Z.; Yang, Y.; Bilal, M.; Wang, W.; Krichen, M.; Alsadhan, A.A.; Ge, C. Smart Optimization Solution for Channel Access Attack Defense under UAV-aided Heterogeneous Network. IEEE Internet Things J. 2023, 1. [Google Scholar] [CrossRef]
  38. Yang, Y.; Wang, W.; Yin, Z.; Xu, R.; Zhou, X.; Kumar, N.; Alazab, M.; Gadekallu, T.R. Mixed Game-Based AoI Optimization for Combating COVID-19 With AI Bots. IEEE J. Sel. Areas Commun. 2022, 40, 3122–3138. [Google Scholar] [CrossRef]
  39. Gastpar, M.; Rimoldi, B.; Vetterli, M. To code, or not to code: Lossy source-channel communication revisited. IEEE Trans. Inf. Theory 2003, 49, 1147–1158. [Google Scholar] [CrossRef]
  40. Gunduz, D.; Qin, Z.; Aguerri, I.E.; Dhillon, H.S.; Yang, Z.; Yener, A.; Wong, K.K.; Chae, C.-B. Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications. IEEE J. Sel. Areas Commun. 2022, 41, 5–41. [Google Scholar] [CrossRef]
  41. Chomsky, N. Syntactic Structures; Mouton: Hague, The Netherlands; Paris, France, 1957. [Google Scholar]
  42. Yang, Y.; Zhang, B.; Guo, D.; Xu, R.; Wang, W.; **ong, Z.; Niyato, D. Semantic Sensing Performance Analysis: Assessing Keyword Coverage in Text Data. IEEE Trans. Veh. Technol. 2023, 1–5. [Google Scholar] [CrossRef]
Figure 1. The classical communication architecture.
Figure 1. The classical communication architecture.
Electronics 12 02755 g001
Figure 2. The semantic communication architecture.
Figure 2. The semantic communication architecture.
Electronics 12 02755 g002
Figure 3. The SDSC architecture.
Figure 3. The SDSC architecture.
Electronics 12 02755 g003
Figure 4. The occurrence frequency of unigrams, bigrams, and trigrams within a given corpus. The abscissa denotes the corresponding ordinal number, while the ordinate of the graph depicts the frequency of occurrence of each item. Additionally, the logarithmic scale of the abscissa and ordinate allows for a better visualization of the distribution, emphasizing the differences between the most frequent and less frequent items. (a) displays the statistical results based on natural language sequential coding, while (b) exhibits the statistical results based on shuffled coding.
Figure 4. The occurrence frequency of unigrams, bigrams, and trigrams within a given corpus. The abscissa denotes the corresponding ordinal number, while the ordinate of the graph depicts the frequency of occurrence of each item. Additionally, the logarithmic scale of the abscissa and ordinate allows for a better visualization of the distribution, emphasizing the differences between the most frequent and less frequent items. (a) displays the statistical results based on natural language sequential coding, while (b) exhibits the statistical results based on shuffled coding.
Electronics 12 02755 g004
Figure 5. Effect comparison of semantic communication systems with different data training methods. The abscissa depicts the signal-to-noise ratio (SNR) of the additive white Gaussian noise (AWGN) channel, and the ordinate represents the Bilingual Evaluation Understudy (BLEU) metric, which measures the similarity between the decoded message and the original message. Additionally, (ad) show performance curves with BLEU1-4 as evaluation metrics. The blue line represents the communication performance obtained by directly transmitting messages using the semantic encoder–decoder system. The red line represents the communication performance obtained by combining lossless Huffman code words with the SDSC system. The yellow line represents the communication performance obtained by training the SDSC system with the shuffled code words.
Figure 5. Effect comparison of semantic communication systems with different data training methods. The abscissa depicts the signal-to-noise ratio (SNR) of the additive white Gaussian noise (AWGN) channel, and the ordinate represents the Bilingual Evaluation Understudy (BLEU) metric, which measures the similarity between the decoded message and the original message. Additionally, (ad) show performance curves with BLEU1-4 as evaluation metrics. The blue line represents the communication performance obtained by directly transmitting messages using the semantic encoder–decoder system. The red line represents the communication performance obtained by combining lossless Huffman code words with the SDSC system. The yellow line represents the communication performance obtained by training the SDSC system with the shuffled code words.
Electronics 12 02755 g005
Figure 6. Effect comparison of semantic communication systems with varying degrees of distortion data training. The horizontal axis depicts the SNR of the AWGN channel, while the vertical axis represents the BLEU metric. Additionally, (ad) show performance curves with BLEU1-4 as evaluation metrics. The blue line denotes the transmission performance of the SDSC system trained with lossless Huffman coding, while the red line represents the transmission performance of the SDSC system trained with 12-bit distorted Huffman coding. The yellow line illustrates the transmission performance of the SDSC system trained with 9-bit distorted Huffman coding. The purple line demonstrates the transmission performance of the SDSC system trained with 6-bit distorted Huffman coding. The results indicate that the system trained with lossless Huffman coding achieves the highest performance, followed by the system trained with 12 bits code words. The system trained with 9-bit and 6-bit code words exhibits the worse performance, as expected from the analysis conducted in Section 2.2.
Figure 6. Effect comparison of semantic communication systems with varying degrees of distortion data training. The horizontal axis depicts the SNR of the AWGN channel, while the vertical axis represents the BLEU metric. Additionally, (ad) show performance curves with BLEU1-4 as evaluation metrics. The blue line denotes the transmission performance of the SDSC system trained with lossless Huffman coding, while the red line represents the transmission performance of the SDSC system trained with 12-bit distorted Huffman coding. The yellow line illustrates the transmission performance of the SDSC system trained with 9-bit distorted Huffman coding. The purple line demonstrates the transmission performance of the SDSC system trained with 6-bit distorted Huffman coding. The results indicate that the system trained with lossless Huffman coding achieves the highest performance, followed by the system trained with 12 bits code words. The system trained with 9-bit and 6-bit code words exhibits the worse performance, as expected from the analysis conducted in Section 2.2.
Electronics 12 02755 g006
Table 1. The parameters used to construct the SDSC system.
Table 1. The parameters used to construct the SDSC system.
ModelLayerUnits (Others)
Source CodingHuffman Coding3780
Source Coding
Transformation
Dense + ReLU3780
Dense128
Value Normalization 1 / 128
Semantic EncoderPosition Encoding512
Dropout p = 0.1
Transformer Encoder × 3128 (8 heads)
Dense + ReLU256
Dense16
Power Normalization x / x 2
ChannelAWGNSNR: −8~16 dB
Semantic DecoderDense + ReLU128
Dense + ReLU512
Dense128
Transformer Decoder × 3128 (8 heads)
Dense3780
SoftmaxGreed search
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Feng, Y.; Xu, J.; Liang, C.; Yu, G.; Hu, L.; Yuan, T. Decoupling Source and Semantic Encoding: An Implementation Study. Electronics 2023, 12, 2755. https://doi.org/10.3390/electronics12132755

AMA Style

Feng Y, Xu J, Liang C, Yu G, Hu L, Yuan T. Decoupling Source and Semantic Encoding: An Implementation Study. Electronics. 2023; 12(13):2755. https://doi.org/10.3390/electronics12132755

Chicago/Turabian Style

Feng, Yulong, ** Xu, Chulong Liang, Guanghui Yu, Liujun Hu, and Tao Yuan. 2023. "Decoupling Source and Semantic Encoding: An Implementation Study" Electronics 12, no. 13: 2755. https://doi.org/10.3390/electronics12132755

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop