1. Introduction
The expansion of communication and network technologies, as well as technological advances in the design and implementation of microprocessor devices, have led to the ability to informational connecting different devices and creation of intelligent systems capable of monitoring and managing complex processes. Communication devices utilize the Internet infrastructure and protocols to create a world of connected devices, like Wireless Sensor Networks (WSN) and Internet of Things (IoT). This technological advancement enables the progress of many technological and life processes bringing to us smart cities, autonomous vehicles, robotization and intelligent robot behavior [
1,
2,
3,
4]. In that context, information security has a very important role in compromising the integrity and privacy of data in such an integrated world can cause serious damage, even to the level of a general disaster [
5,
6,
7,
8,
9]. Therefore, in addition to security mechanisms incorporated into Internet protocols, additional security mechanisms incorporated into devices and systems are used to prevent unintended behavior. Moreover, a huge number of that type of devices (sensors, cameras, surveillance systems) need to work in real time fashion so that the defined security mechanisms do not disrupt system behavior. They must be designed in such a way that it is easy to implement them both in hardware and software and their application should not disrupt system behavior i.e., they must be efficient [
10,
11].
There are various IoT applications, connected with different types of sensors, that have become an integral part of our lives, and most of them can be classified in common areas such as smart healthcare services, smart home, intelligent transportation, smart grid, etc. However, as a consequence of mass deployment, many IoT challenges have arisen, such as limited processing capability and memory resources, large amount of data to transmit, different operating characteristics of hardware, and heterogeneous data and networks types [
12,
13,
14,
15,
16]. Moreover, personal privacy, data confidentiality and integrity are also a great challenge of IoT that must be overcome, particularly for devices with limited resources and heterogeneous technologies [
13,
14,
17,
18,
19]. Cryptography can be used to protect confidentiality (or secrecy) of data and communication. It can also be used to ensure the integrity (or accuracy) of information as well as for authentication (and non-repudiation) services [
20]. An important point in the IoT world is that most IoT solutions have a “closed design”, so it is often very difficult or even impossible to incorporate additional security mechanisms after the production process is completed. On the other hand, as a consequence of the limited software and hardware resources of IoT devices, the suite of cryptographic algorithms that can be implemented is narrowing, so the right measure must be found between the desired level of security and implementation capabilities, which makes the security issue even more challenging [
13,
16]. Different cryptographic algorithms that offer roughly the same level of security may require different power and resource consumption, so you need to choose the right one, subject to the limitations of some specific IoT application and deployed hardware [
19]. Given that public-key crypto algorithms, compared to symmetric crypto algorithms, have far greater power and resource consumption due to their high processing time [
21], it is a natural choice to use a symmetric algorithm in IoT security solution design. Detailed analysis and comparison of symmetric block-type algorithms such as AES, RC6, Twofish, SPECK128, LEA, and ChaCha20-Poly1305 algorithms in IoT devices are given in [
16]. On the other hand, stream or sequential symmetric key ciphers are typically faster than block-type. Block ciphers, in general, require more memory resources to encrypt/decrypt larger chunks (block) of data, while sequential ciphers usually take only one or a few bits at a time, they have relatively low memory requirements and therefore are suitable to implement in limited scenarios. Stream cryptography algorithms, as a subgroup of symmetric cryptography algorithms, are among the most common cryptography data protection techniques. The idea comes from Shannon’s one-time pad system, where instead of a random sequence of encryption bits, a series of bits obtained from a pseudo-random generator is used [
20]. The sequence generated by the pseudo-random generator is used for plain-text encryption and its properties determine the security of the protected data. Therefore, the basic cryptography goal in stream cryptography systems is to design pseudo-random bit/symbol generators with good cryptographic characteristics. Many ideas have been implemented in the last fifty years, with more or less success.
One of the most popular and widely used pseudo-random sequence generator is the RC4 generator defined in 1987 by Ron Rivest. The description of the algorithm was revealed by reverse engineering of the RSA INC software [
22], and the correctness of the algorithm description obtained was confirmed by Rivest himself [
23,
24].
The RC4 algorithm owes its popularity to its simplicity and ease of implementation in both software and hardware. The high popularity and applicability has attracted the attention of the cryptanalytic community. The results of a deep and thorough analysis of this algorithm led to the detection of a number of weaknesses of the algorithm. A comprehensive review of the weaknesses identified is given in [
25] where the empirically detected weaknesses are theoretically proved as well as the original results of the authors of the article. The compromitation of this algorithm was additionally contributed by the implementation methods in security protocols, so that its use in security protocols has not been recommended since 2015 [
26].
On the other hand, the beauty and elegance of the idea itself suggest the possibility of its exploitation.
Our work is aimed to define low complexity and efficient generic model of the pseudo-random generator that does not suffer from the weaknesses immanent to the RC4 and which is suitable for the implementation of the security solution in the computational constrained microprocessor environments, e.g., WSN and IoT. This paper defines a pseudo-random generator that can, in some way, be considered as a generalization of ideas related to RC4 because it uses the time varying permutations, sequences for permutation changing and addressing output element from the current generator state. In order to prove plausible cryptographic properties of the proposed pseudo-random generator different mathematical techniques are used to analyze probability distribution of the output sequence, correlation properties, information leakage between the state of the generator and output sequence and periodicity.
The paper is organized in four parts. After the first part containing motivation for this work and introduction in the second part we introduce necessary notation, describe proposed pseudo-random generator and his relationship to RC4 in brief. The third part contains the analysis of the generator, some comments and remarks. The fourth part contains a summary of the paper’s results.
2. Notation and Generating Algorithm Description
Let . Then with we will denote a set of all bijections from to , and as usual, its elements will be named permutations. Set is a totally ordered set by the, so called, lexicographic order and have exactly elements, where ! denotes factorial operation. Elements of the set we will denote by .
By we will denote the probability of set X.
Let be a set of permutations on with the following properties:
For any exists a number and set of integers such that , where ∘ is composition of functions.
For all if then .
where I is identical permutation.
Let and be a two sequences of independent identically distributed random variables with and
Then we will define a pseudo-random sequence
with equations
From (
1) generating algorithm for the sequence
is obvious. As a first step we will construct the sequence of permutations
,
using sequence
and element
of the sequence
is computed as a value of the function
at the point
. Graphical presentation of the sequence generating process is given on the
Figure 1.
Defined generator algorithm apply time-varying permutations as well as the RC4 but in RC4 fixed set of permutations, set of transpositions, is used. Graphical presentation of the RC4 algorithm is given on the
Figure 2. where the summatios and numbers with denote reduction modulo
In the defined generator case any set S which is generator of can be used. Sequences that are used in RC4 applied transposition determination and address of permutation table position corresponds to sequences and in our algorithm respectively. While the mentioned sequences from RC4 are precisely defined, in our case sequences and are arbitrarily chosen and the generator is parameterized by those two sequences. Later in the paper we define sufficient conditions for sequences and to achieve good pseudo-random and security properties.
3. Analysis of the Generator
For every pseudo-random generator it is necessary to analyze its properties regarding the possibilities of output sequence prediction or reconstruction of the generator initial state. In that sense desirable properties are uniform distribution of the output sequence, nonexistence of the correlation between the output sequence and elements of the generator, nonexistence of the output sequence auto-correlation and long period of the output sequence. These features are especially important for generators used in security solutions and lack any of them usually have serious consequences on the security of the system. Different examples can be found in [
20,
27].
3.1. Distribution of the Generated Sequence
Intuitively one can expect that has a uniform distribution but relatively weak constraints demanded for the sequences and require formal proof for the expectations. By the next theorem we will show that has an asymptotically uniform distribution.
Theorem 1. If
andfor all
andfor all
then pseudo-random sequencehas an asymptotically uniform distribution i.e.,for.
The proof of the Theorem 1 will be derived in two steps. First, using Markov chains theory, we will show that sequence has asymptotically uniform distribution and after that, in the second step, using that result we will show the statement of the theorem.
Proof. First we will analyze the sequence of functions
. It is obvious that
, as a consequence of
being group. Next we will show that
To prove (
2) we will observe the sequence
as a stationary Markov chain over the set of states
. Indeed, according to the definition of
, transition from the
to
doesn’t depend on the history of
, but only on the current state
and the value of
.
Denote by
a row matrix whose elements are the probabilities that after
n steps the chain is in the state
. Let
be the probability that the chain changes state from
to
in one step and
be one step transition matrix for the Markov chain. Denote by
n-step probability transition matrix of that system starting at the state
changes to state
after exactly
n steps. It is well known, (see [
28,
29]), that
and that,
When the limit values in (
3) exists. To show that
exists it is sufficient to show that such
exists for which
for all
(see [
28,
29]).
Let us define numbers
as
Due to the properties of the set
S it is clear that
. Let
be
and show that
is greater than zero. Because
is group then the equation
has exactly one solution,
, so we can write
Now, to prove
it is sufficient to show that at least one summand in (
5) is greater then zero, see [
28,
29]. Because
we can find a set of indices
such that
. Because the index of identical permutation is 1, then the summand which corresponds to the set of indices
is evidently greater then zero and we showed that
exists. Because convergence is component wise,
exists. Limit value
can be determined as a solution of the system of a equations known as Champman-Kolmogorov equations.
It is easy to check, by substitution, that the
is unique solution of the system (
6).
From now on the proof is straightforward. Let
be arbitrary, then
Because
and
are independent random variables it follows from (
7) that
Finding limit of the both sides of the (
8) it follows
because
do not depend on
Using that
P
=
from (
9) it follows that
which proves the theorem. □
Remark 1. Asymptotically uniform distribution of the sequence as we have shown, is a consequence of the asymptotically uniform distribution of If it is easy to verify that has a uniform distribution and as a consequence has a uniform distribution too.
Theorem 2. If the random variables are uniformly distributed then for all Proof. By the generator definition we have
because
is uniformly distributed from (
10) it follows that
because
and
and theorem is proved.□
By these theorems we showed that generator has at least asymptotically uniform distribution of the values in the generator output sequence. This is important security feature because it indicates impossibility of the prediction of the generator output sequence based on the probability distribution of the output sequence values.
Correlation Properties
Theorem 3. If the random variables are uniformly distributed then for all
Proof. By the generator definition we have
Now, using notation from the Theorem 1
is equal to
and putting it in (
11) it follows that
Because
are permutations
is equal to
and using that
are independent random variables from (
12) it follows that
Now, using that
and
are uniformly distributed independent random variables it follows from (
13) that
because
and
and statement is proved.
Using statement of the Theorem 2 that
by the definition of conditional probability it follows that
which proves the statement. □
3.2. Information Leakage
Information leakage means existence of the correlation between generator output sequence and elements of its inner state. Such type correlation may be base for the process of reconstruction of the some generator state during his generation history. Knowledge of the one state during the generator work and knowledge of the generator algorithm allows prediction of the future elements of the output sequence which is undesirable in security applications. In this part, correlation with the state element sequences and with is considered.
Theorem 4. Under the conditions of the Theorem 1 we have
whereand
Proof. By the definition
Because
and
are disjoint events for
and
, from (
15) it follows that
Now, using Bayes theorem from (
16) it follows that
Now, because events
and
are independent from (
17) it follows that
Grou** summands which depends on
j from (
18) it follows that
Taking the limit from the both sides in (
19) it follows
which proves the statement.
By the definition of mutual information we have that
We will start with computing
Taking the limit from the both sides in (
21) and using Theorem 1 we have
In the same way it follows that
Taking the limit from both sides in (
23) and using part 1 of the Theorem 4 we obtain
Using (
22) and(
24) in (
20) it follows that
which proves the statement. □
Theorem 5. Under the conditions of the Theorem 1 we have
where
Proof. By the definition of conditional probability it follows that
Because
and
are disjoint events when
from (
26) it follows that
Using that
from (
27) it follows that
Because
and
are independent random variables from (
28) it follows that
Taking the limits from the both sides in (
29) it follows
In the same way as in the Theorem 4 it follows that
By the definition of conditional entropy it follows that
Taking the limit of both sides in (
30) and statement of the part 1 we obtain
And finally, using the (
22) and (
31) in the definition for the
, it follows that
which proves the statement. □
3.3. Periodicity
Every pseudo-random generator can be viewed as a finite automaton with output over the finite set of states and symbols. Because the automaton transition function is deterministic it follows that the output sequence must be periodic. So, and are periodic and denote their periods by A and B respectively. It is easy to verify that and are periodic too and denote their periods respectively. In this part relations between and Z are considered and some sufficient conditions under are defined which improves the value of Z. For that we need a few Lemmas.
To find out period of we will first determine the period of the .
Lemma 1. Denote by the order of the permutation . Then the period of is .
Proof. First we have to prove that
is a period, but it is straightforward.
and because
C is period of the
we have
Next step is to prove that is the fundamental period i.e., that every other period is divisible by .
Suppose contrary, that
isn’t the fundamental period i.e., that the fundamental period is
and
. From
we have
where
I is the identical permutation. Multiplying (
32) with
from the left we have
and applying (
32) we have
From this we have
which means that
d is a period of
and that
i.e.,
. Now, look at
and we conclude that
for every
. Finally we have
and conclude that
which is in contradiction with
so we proved that
is the fundamental period of
. □
Lemma 2. Let G be the fundamental period of . If and then period of is
Proof. By straightforward computation we can easily check that is a period of the . We need only to prove that is fundamental period of .
Suppose that the fundamental period isn’t but it is d. Then and because we have and .
Further,
for every
. We can set
in the equation above which transform it to
and having in mind that
G is a period of
we obtain
From the fact that
is bijection it follows that
which means that
is a period of
and consequently that
. Using that
we have that
and with
we conclude that
. According to former observations we have that
d must be of the form
and rewriting of (
33) yields
This equation we can simplify to
because
A is a period of
Now we put our attention to
.
so we have that functions
are equal on the set
. Set
is equal
because
and from the periodicity of
follows
. Functions
are equal on their domain so we have that
which means that
is a period of the
. In the same way as above we have that
and we have that the fundamental period of the
is
. □
Following Corollary is a trivial consequence of the Lemma 2.
Corollary 1. If then the period of the is greater or equal to A.
This corollary shows that with suitably chosen pseudo-random sequence output sequence will have the period greater or equal to period of the
The next theorem, the main result of this paragraph, is a straightforward application of the former Lemmas.
Theorem 6. Let denote the order of the permutation . Then if and the period of is
A statement of this theorem is a stronger variant of the Corollary 1 and it shows that with appropriately selected pseudo-random sequences and period of the generator output sequence is significantly longer then period sequences and .
4. Conclusions
Confidentiality of different sensor networks is a very serious requirement since such networks cannot fully achieve their purpose without having the necessary security. Namely, for various IoT applications, which typically have limited processing capabilities, restricted memory capacity and power constraints, one of the key challenges is to design an efficient and reliable cryptographic generator that meets the desired security requirements. In this paper, we defined a pseudo-random generator that can, in some way, be considered as a generalization of ideas related to RC4 because it uses time varying permutations and sequences for permutation changing and addressing output element from the current permutation are considered in a general fashion. In the paper, we analyze properties of the purposed generator. The proposed pseudo-random generator can be implemented efficiently in software and hardware, for example by using output of the multiple linear shift registers as input sequences and of the generator. The security characteristics considered in the paper potentiate application of the generator in the computational constrained environments security solutions.
In the first part of the proposed generator properties analysis, the generator output sequence probability distribution is considered. Theorem 1 establishes sufficient conditions for the generator output sequence have an asymptotically uniform probability distribution. Moreover, sufficient conditions are established for the distribution of the output sequence to have the exact uniform distribution, Remark 1 and Theorem 2. The generator output sequence uniform distribution indicates resistance of the generator to attacks based on output sequence elements prediction.
The second part of the generator analysis deala with the correlation properties of the generator output sequence in which it was shown, by Theorem 3, that the output sequence elements are asymptotically independent and accordingly no immanent remote correlations are detected, unlike to the RC4 generator (see [
25]). This property indicates resistance to autocorrelation type atacks.
The third part of the analysis relates to the possibility of information leaking about the internal state of the generator, sequence and , through the output sequence . Theorems 4 and 5 show that the amount of information that flows through an output sequence tends to zero when the length of the sequence tends to infinity. In practical terms, this means that, by rejecting the initial segment of a generated sequence of a given length, the amount of information about the state of the generator flowing into the output sequence is arbitrarily small.
In the last part of the generator analysis, the period length of the output sequence is analyzed. It has been shown that if sequences and are chosen in a suitable manner and their periods satisfy the conditions of Theorem 6, the output sequence has a significantly longer period than the sequences and .
According to the performed analysis results proposed generator has provably good security characteristics.
Complexity and implementation considerations about this proposal are determined by the generation and complexity of the sequences and Relatively weak constraints demanded for the probability distribution for the sequences and in the Theorem 1 allow implementation of the efficiently generated sequences, for example sequences generated by the multiple linear feedback shift registers.
The method described in this paper makes it possible to obtain a pseudo-random sequence with asymptotically uniform distribution and longer period using two pseudo-random sequences with irregular (non-uniform) probability distributions. Required initial conditions for two pseudo-random sequences are not serious limitations for this method because they describe natural requirements for the pseudo-random sequences, i.e., that values of their elements exhaust the set on which they are defined. An interesting question arising in this context is the speed of convergence in the Theorem 1, i.e., the number of steps after which we can use the sequence as uniformly distributed. It is not possible to answer this question generally because the matrix T is defined by the chosen set S and probability distribution of the random variable C. Consequently, for each set S and random variable C it has to be analyzed separately. In practice, this does not make any restrictions on the application of the proposed generator because in every concrete case it is possible to compute number of transition steps to achieve representation of the limit values by the desired accuracy.