Next Article in Journal
Derivation of Bose’s Entropy Spectral Density from the Multiplicity of Energy Eigenvalues
Next Article in Special Issue
The State-Dependent Channel with a Rate-Limited Cribbing Helper
Previous Article in Journal
Simultaneous Optimization and Integration of Multiple Process Heat Cascade and Site Utility Selection for the Design of a New Generation of Sugarcane Biorefinery
Previous Article in Special Issue
Lossy Compression of Individual Sequences Revisited: Fundamental Limits of Finite-State Encoders
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Refinements and Extensions of Ziv’s Model of Perfect Secrecy for Individual Sequences

The Viterbi Faculty of Electrical and Computer Engineering, Technion—Israel Institute of Technology, Technion City, Haifa 3200003, Israel
Entropy 2024, 26(6), 503; https://doi.org/10.3390/e26060503
Submission received: 9 May 2024 / Revised: 5 June 2024 / Accepted: 8 June 2024 / Published: 9 June 2024
(This article belongs to the Collection Feature Papers in Information Theory)

Abstract

:
We refine and extend Ziv’s model and results regarding perfectly secure encryption of individual sequences. According to this model, the encrypter and the legitimate decrypter share a common secret key that is not shared with the unauthorized eavesdropper. The eavesdropper is aware of the encryption scheme and has some prior knowledge concerning the individual plaintext source sequence. This prior knowledge, combined with the cryptogram, is harnessed by the eavesdropper, who implements a finite-state machine as a mechanism for accepting or rejecting attempted guesses of the plaintext source. The encryption is considered perfectly secure if the cryptogram does not provide any new information to the eavesdropper that may enhance their knowledge concerning the plaintext beyond their prior knowledge. Ziv has shown that the key rate needed for perfect secrecy is essentially lower bounded by the finite-state compressibility of the plaintext sequence, a bound that is clearly asymptotically attained through Lempel–Ziv compression followed by one-time pad encryption. In this work, we consider some more general classes of finite-state eavesdroppers and derive the respective lower bounds on the key rates needed for perfect secrecy. These bounds are tighter and more refined than Ziv’s bound, and they are attained using encryption schemes that are based on different universal lossless compression schemes. We also extend our findings to the case where side information is available to the eavesdropper and the legitimate decrypter but may or may not be available to the encrypter.

1. Introduction

Theoretical frameworks focusing on individual sequences and finite-state encoders and decoders have undergone extensive exploration, diverging from the conventional probabilistic paradigm used in modeling sources and channels. This divergence has been particularly evident across various information-theoretic domains, such as data compression [1,2,3,4,5,6,7], source/channel simulation [8,9], classification [10], prediction [11,12,13,14], denoising [15], and even channel coding [16,17,18]. These references merely scratch the surface of a vast body of literature. Conversely, the realm of information-theoretic security, from Shannon’s seminal contribution [19] to more contemporary research [20,21,22,23,24], remains almost totally entrenched in the probabilistic framework. While these works represent a mere fraction of the extensive literature, they exemplify the nearly exclusive reliance on probabilistic models within this field.
To the best of the author’s knowledge, there are only two exceptions to this prevailing paradigm, documented in an unpublished memorandum by Ziv [25] and a subsequent work [26]. Ziv’s memorandum presents a unique approach wherein the plaintext source, to be encrypted using a secret key, is treated as an individual sequence. The encrypter is modeled as a general block encoder, while the eavesdropper employs a finite-state machine (FSM) as a message discriminator. The memorandum postulates that the eavesdropper possesses certain prior knowledge about the plaintext, expressed as a set of “acceptable messages”, hereafter referred to as the acceptance set. In other words, before observing the cryptogram, the eavesdropper’s uncertainty about the plaintext sequence is that it could be any member of this set of acceptable messages.
This assumption about the prior knowledge available to the eavesdropper is fairly realistic in real life. Consider, for example, the case where the plaintext alphabet is the Latin alphabet, but the eavesdropper knows that the plaintext must be a piece of text in the Italian language. In this case, their prior knowledge allows them to reject every candidate string of symbols that includes the letters ‘j’, ‘k’, ‘w’, ‘x’, and ‘y’, which are not used in Italian. Another example, which is common to English and some other languages, is that the letter ‘q’ must be followed by ‘u’. In the same spirit, some additional rules of grammar can be invoked, like limitations on the number of successive consonants (or vowels) in a word, limitations on the length of a word, and so on.
Now, according to Ziv’s approach, perfectly secure encryption amounts to a situation where the presence of the cryptogram does not reduce the uncertainty associated with the acceptance set. In other words, having intercepted the cryptogram, the eavesdropper learns nothing about the plaintext that they did not know before. The size of the acceptance set can be thought of as a quantifier of the level of uncertainty: a larger set implies greater uncertainty. The aforementioned FSM is used to discriminate between acceptable and unacceptable strings of plaintext symbols that can be obtained by examining various key bit sequences. Accordingly, perfect security amounts to maintaining the size of the acceptance set unchanged, and consequently, the uncertainty level, in the presence of the cryptogram. The principal finding in Ziv’s work is that the asymptotic key rate required for perfectly secure encryption, according to this definition, cannot be lower (up to asymptotically vanishing terms) than the Lempel–Ziv (LZ) complexity of the plaintext source [7]. Clearly, this lower bound is asymptotically achieved by employing one-time pad encryption (that is, bit-by-bit XOR with key bits) of the bit-stream obtained from LZ data compression of the plaintext source, mirroring Shannon’s classical probabilistic result, which asserts that the minimum required key rate equals the entropy rate of the source.
In the subsequent work [26], the concept of perfect secrecy for individual sequences was approached differently. Instead of a finite-state eavesdropper with predefined knowledge, it was assumed that the encrypter could be realized by an FSM, which is sequentially fed by the plaintext source and random key bits. A notion of “finite-state encryptability” was introduced (in the spirit of the analogous finite-state compressibility in [7]), which designates the minimum key rate that must be consumed by any finite-state encrypter, such that the probability law of the cryptogram would be independent of the plaintext input, and hence be perfectly secure. Among the main results in [26], it was asserted and proven that the finite-state encryptability of an individual sequence is essentially bounded from below by its finite-state compressibility, a bound that is once again attained asymptotically through LZ compression followed by one-time pad encryption.
In this work, we revisit Ziv’s approach to perfect secrecy for individual sequences [25]. After presenting his paradigm in detail, we proceed to refine and generalize his findings in certain aspects. First, we consider several more general classes of finite-state discriminators that can be employed by the eavesdropper. These will lead to tight lower bounds on the minimum key rate to be consumed by the encrypter, which will be matched by encryption schemes that are based on some other universal data compression schemes. The resulting gaps between the lower bounds and the corresponding upper bounds (i.e., the redundancy rates) would converge faster. Among these more general classes of finite-state machines, we will consider finite-state machines that are equipped with counters, as well as periodically time-varying finite-state machines with counters. Another direction of generalizing Ziv’s findings is the incorporation of side information (SI) available to both the eavesdropper and the legitimate decrypter, but which may or may not be available to the encrypter.
The outline of this article is as follows. In Section 2, we formulate the model setting, establish the notations, and provide a more detailed background on Ziv’s model and results in [25]. In Section 3, which is the main section of this article, we present the refinements and extensions to other types of FSMs, including FSMs with counters (Section 3.1), shift-register FSMs with counters (Section 3.2), and periodically time-varying FSMs with counters (Section 3.3). Finally, in Section 4, we further extend some of our findings to the case where SI is available to both the legitimate decrypter and the eavesdropper but not necessarily to the encrypter.

2. Formulation, Notations, and Background

Consider the following version of Shannon’s cipher system model, adapted to the encryption of individual sequences, as proposed by Ziv [25]. An individual (deterministic) plaintext sequence, x = ( x 0 , , x n 1 ) (n—positive integer), is encrypted using a random key K, whose entropy is H ( K ) , to generate a cryptogram, W = T ( x , K ) , where the map** T ( · , K ) is invertible given K, namely x can be reconstructed by the legitimate decoder, who has access to K, by applying the inverse function, x = T 1 ( W , K ) . The plaintext symbols, x i , i = 0 , 1 , 2 , , n 1 , take on values in a finite alphabet, X , of size α . Thus, x is a member of X n , the n-th Cartesian power of X , whose cardinality is α n . Without essential loss of generality, we assume that K is a uniformly distributed random variable taking on values in a set K , whose cardinality is 2 H ( K ) . Sometimes, it may be convenient to consider K as the set { 0 , 1 , , 2 H ( K ) 1 } . A specific realization of the key, K, is denoted by k.
An eavesdropper who knows the map**, T, but not the realization of the key, K, is in the quest of learning as much as possible about x upon observing W. It is assumed that the eavesdropper also has some prior knowledge about x, even before observing W. In particular, the eavesdropper knows that the plaintext source string x must be a member of a certain subset of X n , denoted as A n , which is referred to as the acceptance set.
Ziv models the eavesdropper by a cascade of a guessing decrypter and a finite-state message discriminator, which work together as follows. At each step, the eavesdropper examines a certain key, k K , by generating an estimated plaintext, x ^ = T 1 ( W , k ) , and then feeding x ^ into the message discriminator to examine whether x ^ A n . If the answer is affirmative, x ^ is accepted as a candidate; otherwise, it is rejected. Upon completing this step, the eavesdropper moves on to the next key, k + 1 , and repeats the same process. The message discriminator is modeled as a finite-state machine, which implements the following recursive equations for i = 0 , 1 , 2 , , n 1 :
u i = f ( z i , x ^ i )
z i + 1 = g ( z i , x ^ i ) ,
where z 0 , z 1 , z 2 , , z n 1 is a sequence of states; z i S , i = 0 , 1 , 2 , , n 1 ; S is a set of s states (and with the initial state, z 0 , as a fixed member of S ); u 0 , u 1 , , u n 1 is a binary output sequence that indicates acceptance or rejection; f : S × X { 0 , 1 } is the output function; and g : S × X S is the next-state function. If u 0 , u 1 , u 2 , , u n 1 is the all-zero sequence, x ^ is accepted, namely x ^ A n ; otherwise, as soon as u i = 1 for some 0 i n 1 , x ^ is rejected. In other words, A n is defined as the set of all { x ^ } , for which the response of the finite-state discriminator is the all-zero sequence, u = ( 0 , 0 , , 0 ) .
Example 1.
Let X = { 0 , 1 } , and suppose that the membership of x in A n forbids the appearance of more than two successive zeroes. Then, a simple discriminator can detect a violation of this rule using the finite-state machine defined by S = { 0 , 1 , 2 } and
g ( z , x ^ ) = ( z + 1 ) mod 3 x ^ = 0 0 x ^ = 1
f ( z , x ^ ) = 1 z = 2 a n d x ^ = 0 0 o t h e r w i s e
with the initial state being z 0 = 0 . More generally, consider the set of binary sequences that comply with the so-called ( d , k ) constraints, well known in the literature on magnetic recording (see, e.g., [27] and the references therein). These are binary sequences, where the runs of successive zeroes must be of length at least d and at most k, where d and k (not to be confused with the notation of the encryption key) are positive integers with d k . We return to this example later.
Ziv defines perfect secrecy for individual sequences as a situation where, even upon observing W, the eavesdropper’s uncertainty about x is not reduced. In mathematical terms, let us denote
T 1 ( W ) = { T 1 ( W , k ) , k K } ,
and
A n ( W ) = A n T 1 ( W ) ,
and then perfect secrecy is defined as a situation where
A n ( W ) = A n ,
or equivalently,
A n T 1 ( W ) .
To demonstrate these concepts, consider the following example.
Example 2.
Let n = 4 , X = { 0 , 1 } , x = ( 1111 ) , 2 H ( K ) = 8 , and k = ( 1111 ) . Then, for a one-time pad encrypter,
W = T ( x , k ) = x k = ( 1111 ) ( 1111 ) = ( 0000 ) ,
where denotes bit-wise XOR (modulo 2 addition). Let the set K of all eight possible key strings be given by
K = 1111 1000 1100 1001 0000 0111 0011 0110 .
Obviously, the decryption is given by T 1 ( W , k ) = W k . Since W = ( 0000 ) , then T 1 ( W ) = K . Following Example 1, suppose that A 4 is the set of all binary vectors of length n = 4 , which do not contain runs of more than two zeroes. There are only three binary vectors of length 4 that contain a succession of more than 2 (i.e., 3 or 4) zeroes, namely ( 0000 ) , ( 1000 ) , and ( 0001 ) . Thus, | A 4 | = 2 4 3 = 13 . On the other hand,
| A 4 ( W ) | = | A 4 T 1 ( W ) | | T 1 ( W ) | = | K | = 8 < 13 ,
where the notation | E | for a generic finite set E denotes its cardinality. This means that this encryption system is not perfectly secure. The reason for this is that the key space, K , is not large enough.
Clearly, in the quest to minimize H ( K ) without compromising perfect secrecy, the optimal approach is to design the encrypter in such a way that
T 1 ( W ) = A n ,
for every cryptogram W that can possibly be obtained from some combination of x and k. Conceptually, this can be obtained by map** A n to the set of all binary sequences of length H ( K ) utilizing a fixed-rate data compression scheme and applying one-time pad encryption to the compressed sequence. Here, and throughout the following discussion, we neglect integer length constraints associated with large numbers, so H ( K ) is assumed to be an integer without essential loss of generality and optimality.
Remark 1.
For readers familiar with the concepts and the terminology of coding for systems with ( d , k ) constraints (other readers may skip this remark without loss of continuity), it is insightful to revisit the second part of Example 1: If A n is the set of binary n-sequences that satisfy a certain ( d , k ) constraint, then optimal encryption for A n pertains to compression using the inverse map** of a channel encoder for the same ( d , k ) constraint, namely the corresponding channel decoder, which is followed by one-time pad encryption. The minimum key rate needed is then equal to the capacity of the constrained system, which can be calculated either algebraically, as the logarithm of the Perron–Frobenius eigenvalue of the state adjacency matrix of the state transition diagram, or probabilistically, as the maximum entropy among all stationary Markov chains that are supported by the corresponding state transition graph [27].
Ziv’s main result in [25] was that for a finite-state discriminator, if x A n , the cardinality of A n cannot be exponentially smaller than 2 L Z ( x ) (see Appendix A for the proof), where L Z ( x ) is the length (in bits) of the compressed version of x, using the 1978 version of the Lempel–Ziv algorithm (the LZ78 algorithm) [7]. Therefore, the key rate needed to completely encrypt all members of A n is lower bounded by
R = H ( K ) n = log | T 1 ( W ) | n log | A n | n L Z ( x ) n ϵ n ,
where ϵ n is a positive sequence tending to zero as n at the rate of log ( log   n ) log   n . The first inequality of (13) follows from (8). Obviously, this bound is essentially attained through LZ78 compression of x, followed by one-time pad encryption using L Z ( x ) key bits. As can be seen, the gap, ϵ n , between the upper bound and the lower bound on R is O log ( log   n ) log   n , which tends to zero rather slowly.
Remark 2.
For an infinite sequence x = x 0 , x 1 , x 2 , , asymptotic results were obtained in [25] through a two-stage limit process: First, consider a finite sequence of total length m · n , which is divided into m non-overlap** n-blocks. The mechanism described above is applied to each n-block separately. The asymptotic minimum key rate is obtained by taking a double limit superior, first for m for a given n, and then for n . In this work, we consider a similar double limit, although we do not explicitly mention it on every relevant occasion. Instead, we focus on the behavior of a single n-block.

3. More General Finite-State Discriminators

This section, which is the main section of this article, is devoted to describing several more general classes of finite-state discriminators, along with derivations of their respective more refined bounds.

3.1. FSMs with Counters

While Ziv’s model for a finite-state discriminator is adequate for rejecting sequences with certain forbidden patterns (like a succession of more than two zeroes in the above examples), it is not sufficient to handle situations like the following. Suppose that the encrypter applies a universal lossless compression algorithm for memoryless sources, followed by one-time pad encryption. Suppose also that the universal compression scheme is a two-part code, where the first part encodes the index of the type class of x, using a number of bits that is proportional to log   n , and the second part represents the index of x within the type class, assuming that the encoder and the decoder have agreed on some ordering ahead of time. In this case, the length of the cryptogram (in bits), which is equal to the length of the compressed data, is about
L = H ( K ) n H ^ ( x ) + α 1 2 · log   n ,
where H ^ ( x ) is the empirical entropy of x (see, e.g., [3] and the references therein). The eavesdropper, being aware of the encryption scheme, observes the length L of the cryptogram and immediately concludes that x must be a sequence whose empirical entropy is
H 0 = 1 n · L α 1 2 · log n .
In other words, in this case,
A n ( W ) = A n = x ^ : H ^ ( x ^ ) = H 0 .
Therefore, every sequence whose empirical distribution pertains to empirical entropy different from H 0 should be rejected. To this end, our discriminator should be able to gather empirical statistics, namely to count occurrences of symbols (or, more generally, count combinations of symbols and states) and not just detect a forbidden pattern that might have occurred just once in x.
This motivates us to broaden the class of finite-state discriminators to be considered in the following way. We consider discriminators that consist of a next-state function,
z i + 1 = g ( z i , x ^ i ) ,
as before, but instead of the binary output function, f, as in [25], these discriminators are equipped with a set of α · s counters that count the number of joint occurrences of all ( x , z ) X × S for i = 0 , 1 , 2 , , n 1 , i.e.,
n ( x , z ) = i = 0 n 1 I { x ^ i = x , z i = z } , x X , z S ,
where I { A } , for a generic event A, denotes its indicator function, namely I { A } = 1 if A is true; otherwise, I { A } = 0 . A sequence x ^ is accepted (resp. rejected) if the matrix of counts, { n ( x , z ) , x X , z S } , satisfies (resp. violates) a certain condition. In the example in the previous paragraph, a sequence is accepted if
H ^ ( x ^ ) = x X n ( x ) n log n n ( x ) = H 0 ,
where n ( x ) = z S n ( x , z ) . Ziv’s discriminator model is clearly a special case of this model: let ( x , z ) be any combination of input and state such that f ( z , x ) = 1 in Ziv’s model (that is, a “forbidden” combination). Then, in terms of the proposed extended model, a sequence is rejected whenever n ( x , z ) 1 .
Clearly, since x A n and since membership in A n depends solely on the matrix of counts, { n ( x , z ) , x X , z S } , it follows that all { x ^ } that share the same counts as those of x must also be members of A n . The set of all { x ^ } of length n with the same counts, { n ( x , z ) , x X , z S } , as those of x is called the finite-state type class with respect to the FSM g (see also [3]), and it is denoted by T g ( x ) . Since A n T g ( x ) ,
| A n | | T g ( x ) | .
It was proven in [3] (in Lemma 3), that if n ( x , z ) n ( z ) δ ( n ) for every ( x , z ) X × S , where δ ( n ) > 0 may even be a vanishing sequence, then
| T g ( x ) | exp 2 n H ^ ( X | Z ) s ( α 1 ) 2 · log ( 2 π n ) ,
where
H ^ ( X | Z ) = x X z S n ( x , z ) n log n ( x , z ) n ( z ) ,
with n ( z ) = x X n ( x , z ) and with the conventions that 0 log 0 = 0 and 0 / 0 = 0 . Therefore, it follows that the key rate needed for perfect secrecy is lower bounded by
R = H ( K ) n log | A n | n log | T g ( x ) | n H ^ ( X | Z ) s ( α 1 ) 2 · log ( 2 π n ) n .
This lower bound can be asymptotically attained through an encrypter that applies universal lossless compression for finite-state sources with the next-state function g, followed by one-time pad encryption. This universal lossless compression scheme is based on a conceptually simple extension of the above-mentioned universal scheme for the class of memoryless sources: one applies a two-part code, where the first part includes a code for the index of the type class with respect to g (using a number of bits that is proportional to log   n ), and the second part encodes the index of the location of x ^ within T g ( x ) according to a predefined order agreed upon between the encoder and the decoder (see [3] for more details). More precisely, the compression ratio that can be achieved, which is also the key rate consumed, is upper bounded by (see Section III in [3]):
R H ^ ( X | Z ) + s ( α 1 ) 2 · log   n n + O 1 n ,
which tells us that the gap between the lower bound and the achievability is proportional to log   n n , which decays much faster than the aforementioned O log ( log   n ) log   n convergence rate of the gap in Ziv’s approach. Moreover, for most sequences in most type classes, { T g ( x ) } , the coding rate (which is also the key rate in one-time pad encryption) of H ^ ( X | Z ) + s ( α 1 ) 2 · log   n n is smaller than L Z ( x ) / n since the former quantity is essentially also a lower bound to the compression ratio of any lossless compression scheme for most individual sequences in T g ( x ) for almost all such type classes (see Theorem 1 in [3]). The converse inequality between H ^ ( X | Z ) and L Z ( x ) / n , which follows from Ziv’s inequality (see [28,29] (Lemma 13.5.5 in [28])) holds up to an O log ( log   n ) log   n term again.
Remark 3.
To put Ziv’s result in perspective, one may wish to envision a class of discriminators defined by a given dictionary of c distinct ‘words’, which are the c phrases in Ziv’s derivation [25] (see also Appendix A). A possible definition of such a discriminator is that it accepts only n-sequences of plaintext that are formed by concatenating words from the given dictionary, allowing repetitions. These are no longer finite-state discriminators. In this case, Ziv’s derivation is applicable under the minor modification that the various phrases should be classified only in terms of their length but without further partitioning according to initial and final states (z and z in the derivation in Appendix A). This is equivalent to assuming that s = 1 , and accordingly, the term 2 c log   s n in the last equation in Appendix A should be omitted. Of course, the resulting bound can still be matched through LZ78 compression, followed by one-time pad encryption.

3.2. Shift-Register Machines with Counters

Since the eavesdropper naturally does not cooperate with the encrypter, the latter might not know the particular FSM, g, used by the former, and therefore, it is instructive to derive key-rate bounds that are independent of g. To this end, consider the following. Given x and a fixed positive integer ( n ), let us observe the more refined joint empirical distribution,
P ^ X + 1 Z + 1 ( a 0 , , a , s 0 , , s ) = 1 n i = 0 n 1 I { x i = a 0 , , x i = a , z i = s 0 , , z i = s } ,
as well as all partial marginalizations derived from this distribution, where ⊕ denotes addition modulo n so as to create a periodic extension of x (hence, redefining z 0 = g ( z n 1 , x n 1 ) ). Accordingly, the previously defined empirical conditional entropy, H ^ ( X | Z ) , is now denoted as H ^ ( X 1 | Z 1 ) , which is also equal to H ^ ( X 2 | Z 2 ) , etc., due to the inherent shift-invariance property of the empirical joint distribution extracted under the periodic extension of x. Consider now the following chain of inequalities:
H ^ ( X | X 0 , X 1 , , X 1 ) H ^ ( X | Z )
1 + 1 j = 0 [ H ^ ( X j | X 0 , , X j 1 ) H ^ ( X j | X 0 , , X j 1 , Z 0 ) ]
= 1 + 1 [ H ^ ( X 0 , , X ) H ^ ( X 0 , , X | Z 0 ) ]
= I ^ ( Z 0 ; X 0 , , X ) + 1
H ^ ( Z 0 ) + 1
log s + 1 ,
where I ^ ( · ; · ) denotes empirical mutual information, and where in the second line, the term corresponding to j = 0 should be understood to be [ H ^ ( X 0 ) H ^ ( X 0 | Z 0 ) ] . The first inequality follows because given ( X 0 , , X j 1 , Z 0 ) , one can reconstruct Z 1 , Z 2 , , Z j through j recursive applications of the next-state function, g. Therefore,
H ^ ( X j | X 0 , , X j 1 , Z 0 ) = H ^ ( X j | X 0 , , X j 1 , Z 0 , Z 1 , , Z j )
H ^ ( X j | Z j )
= H ^ ( X | Z ) .
Equivalently,
H ^ ( X | Z ) H ^ ( X | X 0 , , X 1 ) log s + 1 ,
and by combining this with Equations (20) and (21), we obtain
log | A n | n H ^ ( X | X 0 , , X 1 ) log s + 1 s ( α 1 ) 2 · log ( 2 π n ) .
The advantage of this inequality is that it is independent of the arbitrary next-state function g. In fact, we actually replaced the arbitrary FSM, g, with a particular FSM—the shift-register FSM—whose state is z i = ( x i , x i + 1 , , x i 1 ) at the cost of a gap of log s + 1 , which can be kept arbitrarily small if the size of the shift-register, , is sufficiently large compared to the memory size, log s , of g.
Remark 4.
The fact that H ^ ( X | X 0 , , X 1 ) cannot be much larger than H ^ ( X | Z ) for large enough ℓ actually suggests that whatever the state of any FSM g can possibly “remember” from the past of x is essentially captured by the recent past, not by the remote past. While this is not surprising in the context of the probabilistic setting, especially if the underlying random process is ergodic and hence has a vanishing memory of the remote past, this finding is not quite trivial when it comes to arbitrary individual sequences.
Returning to our derivations, in view of the first three lines of (13), the key rate needed for perfect secrecy is lower bounded by
R log | A n | n H ^ ( X | X 0 , , X 1 ) log s + 1 s ( α 1 ) 2 n · log ( 2 π n ) ,
and since this holds for any in some fixed range 1 l (i.e., where l is independent of n),
R max 1 l H ^ ( X | X 0 , , X 1 ) log s + 1 s ( α 1 ) 2 n · log ( 2 π n ) .
Note that if x is an “ 0 -th order Markovian sequence” in the sense that H ^ ( X | X 0 , , X 1 ) is almost fixed for all 0 l with l log s , then 0 is the preferred choice for as it essentially captures the best attainable key rate.
This lower bound can be asymptotically attained through universal lossless compression for -th order Markov types [30,31,32] (see Section VII.A in [32]), followed by one-time pad encryption, where the achieved rate is
R H ^ ( X | X 0 , , X 1 ) + α 1 ( α 1 ) 2 · log   n n + O 1 n .
In this case, A n is the -th order Markov type of x, and the finite-state discriminator of the eavesdropper is a shift-register machine, which checks whether the -th order Markov type of each x ^ has the matching empirical conditional entropy of order .
In this result, there is compatibility between the converse bound and the achievability bound in the sense that both are given in terms of FSMs with a fixed number of states that do not grow with n. Among all possible finite-state machines, we have actually singled out the shift-register machine universally with a controllable asymptotic gap of log s + 1 . However, the bound is otherwise explicit, and it is clear how to approach it. If we wish to keep this gap below a given ϵ > 0 , we select = ( log s ) / ϵ 1 . In this sense, it is another refinement of Ziv’s result.

3.3. Periodically Time-Varying FSMs with Counters

So far, we have considered time-invariant FSMs, where the function g remains fixed over time. We now expand the scope to consider the class of discriminators implementable by periodically time-varying FSMs, defined as follows:
z i + 1 = g ( z i , x ^ i , i mod l ) ,
where i = 0 , 1 , 2 , and l is a positive integer that designates the period of the time-varying finite-state machine. Conceptually, this is not really more general than the ordinary, time-invariant FSM defined earlier because the state of the modulo- clock can be considered part of the entire state. In other words, this is a time-invariant FSM with s · l states, indexed by the ordered pair ( z i , i m o d l ) . The reason it makes sense to distinguish between the state z t and the state of the clock is that the clock does not store any information regarding past input data. This is to say that in the context of time-varying finite-state machines, we distinguish between the amount of memory of past input ( log s bits) and the period l. Both parameters manifest the richness of the class of machines but in different manners. Indeed, the parameters s and l play very different and separate roles in the converse bound derived below.
Remark 5.
Clearly, the earlier considered time-invariant finite-state machine is obtained as a special case through l = 1 , or through the case where l is arbitrary, but the next-state functions, g ( · , · , 0 ) , g ( · , · , 1 ) , , g ( · , · , l 1 ) , are all identical.
First, observe that a periodic FSM with period l can be viewed as a time-invariant FSM at the level of l-blocks, { x i l , x i l + 1 , , x i l + l 1 } , i = 0 , 1 , , and hence also at the level of -blocks, where is an integer multiple of l. Accordingly, let be an arbitrary integer multiple of l, but at the same time, assume that divides n. Denote m = n / , and define the counts,
m ( z , z , x ) = i = 0 n / 1 I { z i = z , z i + = z , x ^ i i + 1 = x } , z , z S , x X .
In fact, there is a certain redundancy in this definition because z is a deterministic function of ( z , x ) obtained through recursive applications of the (time-varying) next-state function g. Concretely, m ( z , z , x ) = m ( z , x ) if z matches ( z , x ) and m ( z , z , x ) = 0 otherwise. Nonetheless, we adopt this definition for the sake of clarity of combinatorial derivation to be carried out shortly. In particular, our derivation is based on grou** together all { x } , which, for a given z, yields the same z . Accordingly, we also denote m ( z , z ) = x X m ( z , z , x ) .
Suppose that the acceptance/rejection criterion that defines A n is based on the counts, { m ( z , z , x ) , z , z S , x X } , and then the smallest A n that contains x is the type class of x pertaining to { m ( z , z , x ) , z , z S , x X } . The various sequences in this type class are obtained by permuting distinct -tuples { x ^ i i + 1 , i = 0 , 1 , , m 1 } that begin at the same state, z, and end at the same state, z . Let us define the empirical distribution as
P ^ ( z , z , x ) = m ( z , z , x ) m , z , z S , a X ,
and the joint entropy as
H ^ ( Z , Z , X ) = z , z , x P ^ ( z , z , x ) log P ^ ( z , z , x ) ,
and let
H ^ ( X | Z , Z ) = H ^ ( Z , Z , X ) H ^ ( Z , Z ) ,
where H ^ ( Z , Z ) is the marginal empirical entropy of ( Z , Z ) . Then, using the method of types [32], we have:
| A n | z , z S m ( z , z ) ! x X m ( z , z , x ) !
z , z S ( m + 1 ) α · exp 2 x m ( z , z , x ) log m ( z , z ) m ( z , z , x )
( m + 1 ) s 2 α · 2 m H ^ ( X | Z , Z )
= ( m + 1 ) s 2 α · 2 m [ H ^ ( X ) I ( Z , Z ; X ) ]
( m + 1 ) s 2 α · 2 m [ H ^ ( X ) H ( Z , Z ) ]
( m + 1 ) s 2 α · 2 m [ H ^ ( X ) 2 log s ]
= exp 2 n H ^ ( X ) 2 log s s 2 α n log n + 1 ,
and so
R log | A n | n H ^ ( X ) 2 log s s 2 α n log n + 1 ,
which, for 2 log s , can be essentially matched through universal compression for block-memoryless sources and one-time pad encryption using exactly the same ideas as before. Once again, we have derived a lower bound that is free of dependence on the particular FSM, g. Recall that divides n and that it is also a multiple of l, but otherwise, is arbitrary. Hence we may maximize this lower bound with respect to subject to these constraints. Alternatively, we may rewrite the lower bound as
R max { q d i v i d e s n / l } H ^ ( X q l ) q l 2 log s q l s 2 α q l n log n q l + 1 .
Clearly, in view of Remark 5, the above lower bound also applies to the case of a time-invariant FSM, g, but then there would be some mismatch between the upper and lower bounds because to achieve the lower bound, one must gather more detailed empirical statistics, namely empirical statistics of blocks together with states, rather than just single symbols with states.

4. Side Information

Some of the results presented in the previous sections extend to the case where side information (SI) is available to both the legitimate decrypter and the eavesdropper. In principle, it may or may not be available to the encrypter, and we consider first the case where it is available. We assume the SI sequence to be of length n, denoted by y = ( y 0 , y 1 , , y n 1 ) , where y i Y , i = 0 , 1 , , n 1 . It is related to the plaintext, x, to be encrypted, but like x, it is a deterministic, individual sequence. The SI alphabet Y is finite, and its cardinality, | Y | , is denoted by β . Here, the acceptance set depends on y and is denoted accordingly by A n ( y ) . The encrypter is, therefore, a map** W = T ( x , y , K ) , which is invertible given ( y , K ) , allowing the decrypter to reconstruct x = T 1 ( W , y , K ) .
Consider first a natural extension of Ziv’s model for a finite-state discriminator, which for i = 0 , 1 , , n 1 , implements the recursion,
u i = f ( z i , x ^ i , y i )
z i + 1 = g ( z i , x ^ i , y i ) .
Similarly as before, the acceptance set, A n ( y ) is the set of all { x } that, in the presence of the given y, yield the all-zero output sequence, u = ( u 0 , u 1 , , u n 1 ) = ( 0 , 0 , , 0 ) .
Let c ( x , y ) denote the number of joint parsing phrases of
( x , y ) = ( ( x 0 , y 0 ) , ( x 1 , y 1 ) , , ( x n 1 , y n 1 ) )
into distinct phrases, and let c l ( x | y ) denote the number of occurrences of y ( l ) —the l-th distinct phrase of y—which is also the number of different phrases of x that appear jointly with y ( l ) . Let c ( y ) denote the number of different { y ( l ) } , that is, l = 1 c ( y ) c l ( x | y ) = c ( x , y ) . Finally, let c l z z ( x | y ) denote the number of x-phrases that are aligned with y ( l ) , beginning at state z and ending at state z . Then, similarly as in the derivation in [25] (see also Appendix A),
| A n ( y ) | l = 1 c ( y ) z , z c l z z ( x | y ) c l z z ( x | y ) ,
and so, defining the auxiliary RVs ( Z l , Z l ) , l = 1 , , c ( y ) , as being jointly distributed according to
Q l ( z , z ) = c l z z ( x | y ) c l ( x | y ) , z , z S ,
we have
log | A n ( y ) | l = 1 c ( y ) z , z c l z z ( x | y ) log c l z z ( x | y ) = l = 1 c ( y ) c l ( x | y ) z , z c l z z ( x | y ) c l ( x | y ) log c l z z ( x | y ) c l ( x | y ) + log c l ( x | y ) = l = 1 c ( y ) c l ( x | y ) log c l ( x | y ) l = 1 c ( y ) c l ( x , y ) H ( Z l , Z l ) l = 1 c ( y ) c l ( x | y ) log c l ( x | y ) 2 · l = 1 c ( y ) c l ( x , y ) log s = l = 1 c ( y ) c l ( x | y ) log c l ( x | y ) 2 c ( x , y ) log s l = 1 c ( y ) c l ( x | y ) log c l ( x | y ) 2 n log s ( 1 ϵ n ) log n ,
where the last inequality follows from [7] (see also Lemma 13.5.3 in [28]), with ϵ n = O log ( log   n ) log   n . This lower bound can be asymptotically attained through the conditional version of the LZ algorithm (see [33,34]), followed by one-time pad encryption. If y is unavailable to the encrypter, x can still be compressed into about
u ( x | y ) = l = 1 c ( y ) c l ( x | y ) log c l ( x | y )
bits before one-time pad encryption, using Slepian–Wolf encoding (Section 15.4 in [28]) and reconstructing (with high probability) after decryption using a universal decoder that uses u ( x | y ) as a decoding metric (see [35]).
Unfortunately, and somewhat surprisingly, a direct extension of the results in Section 3.1 and Section 3.2 to the case with SI turns out to be rather elusive. The reason is the lack of a single-letter formula for the exponential growth rate of the cardinality of a finite-state conditional type class of x-vectors that is defined by joint counts of the form n ( x , y , z ) = i = 0 n 1 I { x i = x , y i = y , z i = z } , even in the simplest special case where z i = x i 1 . In a nutshell, this quantity depends on y in a complicated manner, which cannot be represented in terms of an empirical distribution of fixed dimension that does not grow with n. However, we can obtain at least a lower bound if we treat these cases as special cases of periodically time-varying FSMs with counters in view of Remark 5.
Finally, consider the class of discriminators implementable by periodically time-varying FSMs with SI, defined as follows:
z i + 1 = g ( z i , x ^ i , y i , i mod l ) ,
where i = 0 , 1 , 2 , and l is as in Section 3.3. The extension of the derivation in Section 3.3 is quite straightforward—one has to count the permutations of the -vectors of x that not only share the same initial and final states but are also aligned to the same -blocks of y. The resulting bound is given by:
R log | A n ( y ) | n H ^ ( X | Y ) 2 log s s 2 α β n log n + 1 ,
which, for 2 log s , can be essentially matched through universal compression for block-memoryless sources in the presence of SI and one-time pad encryption. In the absence of SI at the encrypter, one may use universal Slepian–Wolf coding for block-memoryless sources, which is a direct extension of universal Slepian–Wolf coding for memoryless sources (see, e.g., [36]).

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A. Proof of the Last Step of Equation (13)

Since [25] is unpublished, for the sake of completeness, we provide here the essential steps of Ziv’s proof of the inequality,
log | A n | L Z ( x ) n ϵ n ,
which must hold for every finite-state discriminator with no more than s states, where ϵ n = O log ( log n ) log n for fixed s. The final steps are different from those in [25], as we adopt a somewhat simpler approach, which is presented in Section 13.5 in [28].
Let x be parsed into c distinct phrases,
( x 0 , x 1 , , x n 1 1 ) , ( x n 1 , x n 1 + 1 , , x n 2 1 ) , , ( x n c , x n c + 1 , , x n 1 ) ,
with the possible exception of the last phrase, which might be incomplete. Let c l z z denote the number of phrases of length l, where the initial state of the FSM g is z and the final state is z (of course, l , z , z c l z z = c ). If the discriminator accepts x, namely if u = ( 0 , 0 , , 0 ) , then for every ( l , z , z ) , each of the c l z z phrases can be replaced with any of the other such phrases to obtain new sequences of length n for which the output is also u = ( 0 , 0 , , 0 ) , and hence must be accepted too. Now, let ( L , Z , Z ) denote a triple of auxiliary random variables jointly distributed according to the probability distribution
Q ( l , z , z ) = c l z z c , z , z S , l = 1 , 2 ,
and let H ( L , Z , Z ) denote the joint entropy of ( L , Z , Z ) . Then,
| A n | l , z , z ( c l z z ) c l z z ,
where ( c l z z ) c l z z on the right-hand side is the number of ways each one of the c l z z phrases of length l, starting at state z and ending at state z , can be replaced with any other phrase with the same qualifiers. Consequently,
log | A n | l , z , z c l z z log c l z z
= c · l , z , z c l z z c log c l z z c + log c
= c log c c · H ( L , Z , Z )
c log c c [ H ( L ) + H ( Z ) + H ( Z ) ]
c log c c · H ( L ) 2 c log s .
Now, the entropy of L, given that E { L } = n / c , cannot be larger than ( n / c + 1 ) log ( n / c + 1 ) ( n / c ) log ( n / c ) 1 + log ( n / c + 1 ) (see Lemma 13.5.4 and Equations (13.120)–(13.122) in [28]). Thus,
log | A n | c log c 2 c log s c c log n c + 1 ,
and then
R log | A n | n c log c n 2 c n log s c n c n log n c + 1 = c log c n 2 c n log s c n O log log n log n ,
where the last line is derived similarly as in Equations (13.123)–(13.124) in [28]. The final step of Equation (13) is obtained from the fact that L Z ( x ) is upper bounded by c log   c plus terms that, after normalizing by n, are negligible compared to log ( log   n ) log   n (see Theorem 2 in [7]).

References

  1. Kieffer, J.C.; Yang, E.-H. Sequential Codes, Lossless Compression of Individual Sequences, and Kolmogorov Complexity. IEEE Trans. Inform. Theory 1996, 42, 29–39. [Google Scholar] [CrossRef]
  2. Yang, E.-H.; Kieffer, J.C. Simple universal lossy data compression schemes derived from the Lempel–Ziv algorithm. IEEE Trans. Inform. Theory 1996, 42, 239–245. [Google Scholar] [CrossRef]
  3. Weinberger, M.J.; Merhav, N.; Feder, M. Optimal sequential probability assignment for individual sequences. IEEE Trans. Inform. Theory 1994, 40, 384–396. [Google Scholar] [CrossRef]
  4. Ziv, J. Coding theorems for individual sequences. IEEE Trans. Inform. Theory 1978, 24, 405–412. [Google Scholar] [CrossRef]
  5. Ziv, J. Distortion–rate theory for individual sequences. IEEE Trans. Inform. Theory 1980, 26, 137–143. [Google Scholar] [CrossRef]
  6. Ziv, J. Fixed-rate encoding of individual sequences with side information. IEEE Trans. Inform. Theory 1984, 30, 348–452. [Google Scholar] [CrossRef]
  7. Ziv, J.; Lempel, A. Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory 1978, 24, 530–536. [Google Scholar] [CrossRef]
  8. Martín, A.; Merhav, N.; Seroussi, G.; Weinberger, M.J. Twice–universal simulation of Markov sources and individual sequences. IEEE Trans. Inform. Theory 2010, 56, 4245–4255. [Google Scholar] [CrossRef]
  9. Seroussi, G. On universal types. IEEE Trans. Inform. Theory 2006, 52, 171–189. [Google Scholar] [CrossRef]
  10. Ziv, J. Compression, tests for randomness, and estimating the statistical model of an individual sequence. In Sequences: Combinatorics, Compression, Security, and Transmission; Capocelli, R.M., Ed.; Springer Verlag: New York, NY, USA, 1990; pp. 366–373. [Google Scholar]
  11. Feder, M.; Merhav, N.; Gutman, M. Universal prediction of individual sequences. IEEE Trans. Inform. Theory 1992, 38, 1258–1270. [Google Scholar] [CrossRef]
  12. Haussler, D.; Kivinen, J.; Warmuth, M.K. Sequential prediction of individual sequences under general loss functions. IEEE Trans. Inform. Theory 1998, 44, 1906–1925. [Google Scholar] [CrossRef]
  13. Weissman, T.; Merhav, N. Universal prediction of binary individual sequences in the presence of noise. IEEE Trans. Inform. Theory 2001, 47, 2151–2173. [Google Scholar] [CrossRef]
  14. Ziv, J.; Merhav, N. On context–tree prediction of individual sequences. IEEE Trans. Inform. Theory 2007, 53, 1860–1866. [Google Scholar] [CrossRef]
  15. Weissman, T.; Ordentlich, E.; Seroussi, G.; Verdú, S.; Weinberger, M.J. Universal denoising: Known channel. IEEE Trans. Inform. Theory 2005, 51, 5–28. [Google Scholar] [CrossRef]
  16. Lomnitz, Y.; Feder, M. Universal communication over individual channels. IEEE Trans. Inform. Theory 2011, 57, 7333–7358. [Google Scholar] [CrossRef]
  17. Lomnitz, Y.; Feder, M. Universal communication—Part I: Modulo additive channels. IEEE Trans. Inform. Theory 2013, 59, 5488–5510. [Google Scholar] [CrossRef]
  18. Shayevitz, O.; Feder, M. Communicating using feedback over a binary channel with arbitrary noise sequence. In Proceedings of the International Symposium on Information Theory, 2005—ISIT 2005, Adelaide, Australia, 4–9 September 2005; pp. 1516–1520. [Google Scholar]
  19. Shannon, C.E. Communication theory of secrecy systems. Bell Syst. J. 1948, 27, 479–523+623–656. [Google Scholar] [CrossRef]
  20. Hellman, M.E. An extension of the Shannon theory approach to cryptography. IEEE Trans. Inform. Theory 1977, 23, 289–294. [Google Scholar] [CrossRef]
  21. Lempel, A. Cryptology in transition. Comput. Surv. 1979, 11, 285–303. [Google Scholar] [CrossRef]
  22. Liang, Y.; Poor, H.V.; Shamai, S. Information theoretic security. Found. Trends Commun. Inf. Theory 2009, 5, 355–580. [Google Scholar] [CrossRef]
  23. Massey, J.L. An introduction to contemporary cryptology. Proc. IEEE 1988, 76, 533–549. [Google Scholar] [CrossRef]
  24. Yamamoto, H. Information theory in cryptology. IEICE Trans. 1991, E74, 2456–2464. [Google Scholar]
  25. Ziv, J. Perfect secrecy for individual sequences. Technion—Israel Institute of Technology, Technion City, Haifa, Israel. 1978; unpublished manuscript. [Google Scholar]
  26. Merhav, N. Perfectly secure encryption of individual sequences. IEEE Trans. Inform. Theory 2013, 59, 1302–1310. [Google Scholar] [CrossRef]
  27. Marcus, B.; Roth, R.; Siegel, P.H. Constrained Systems and Coding for Recording Channels; Technion-I.I.T. Department of Computer Science: Haifa, Israel, 1998. [Google Scholar]
  28. Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
  29. Plotnik, E.; Weinberger, M.J.; Ziv, J. Upper bounds on the probability of sequences emitted by finite-state sources and on the redundancy of the Lempel-Ziv algorithm. IEEE Trans. Inform. Theory 1992, 38, 66–72. [Google Scholar] [CrossRef]
  30. Davisson, L.D.; Longo, G.; Sgarro, A. The error exponent for noiseless encoding of finite ergodic Markov sources. IEEE Trans. Inform. Theory 1981, 27, 431–438. [Google Scholar] [CrossRef]
  31. Natarajan, S. Large deviations, hypotheses testing, and source coding for finite Markov chains. IEEE Trans. Inform. Theory 1985, 31, 360–365. [Google Scholar] [CrossRef]
  32. Csiszár, I. The method of types. IEEE Trans. Inform. Theory 1998, 44, 2505–2523. [Google Scholar] [CrossRef]
  33. Ziv, J. Universal decoding for finite-state channels. IEEE Trans. Inform. Theory 1985, 31, 453–460. [Google Scholar] [CrossRef]
  34. Uyematsu, T.; Kuzuoka, S. Conditional Lempel-Ziv complexity and its application to source coding theorem with side information. IEICE Trans. Fundam. 2003; E86-A, 2615–2617. [Google Scholar]
  35. Merhav, N. Universal Slepian-Wolf coding for individual sequences. ar**v 2024, ar**v:2403.07409. [Google Scholar]
  36. Draper, S.C. Universal incremental Slepian–Wolf coding. In Proceedings of the 43rd Annual Allerton Conference, Monticello, IL, USA, 28–30 September 2004. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Merhav, N. Refinements and Extensions of Ziv’s Model of Perfect Secrecy for Individual Sequences. Entropy 2024, 26, 503. https://doi.org/10.3390/e26060503

AMA Style

Merhav N. Refinements and Extensions of Ziv’s Model of Perfect Secrecy for Individual Sequences. Entropy. 2024; 26(6):503. https://doi.org/10.3390/e26060503

Chicago/Turabian Style

Merhav, Neri. 2024. "Refinements and Extensions of Ziv’s Model of Perfect Secrecy for Individual Sequences" Entropy 26, no. 6: 503. https://doi.org/10.3390/e26060503

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop