randomness.md 6.3 KB

Randomness

by Miloslav Číž, released under CC0 1.0

What is randomness? I don't know, but I can define it.

Let s be a sequence of N bits, N > 1, which repeats infinitely (as is the case with pseudorandom number generators with period N). Our goal is to define a measure of randomness of such a sequence s.

By randomness we very often understand unpredictability. The idea of our measure will therefore be based on unpredictability.

Consider a scenario in which a pseudorandom number generator keeps generating numbers, repeating the sequence s, and we get to see a random subsequence of this infinite series. The concept of our measure is following: for given N, we consider a sequence s more random if it gives a lower probability of us correctly predicting the next bit from the subsequence we've seen. Let us now set the definition precisely:

r = (Σ^a∈S (max(p0(a),p1(a)) / (p0(a) + p1(a)))) / |S|

Where

  • r is the measured randomness.
  • S is a set of all possible subsequences of s . s (s concatenated with itself), with the lengths of 0 to N - 2 (including). (We exclude the subsequences of length N and N - 1 because with these the following bit can always be predicted correctly, and we concatenate the sequence so that we include sequenced wrapping around from end to beginning, as is the case if we keep repeating the sequence).
  • p0(a) and p1(a) being the probabilities of 0 and 1, respectively, following the subsequence a in s (considering the first bit of s to be the following bit of the last bit of s).

As an example, let's take N = 3.

s a p0(a) p1(a) r
000 1 0 1
0 1 0
001 2/3 1/3 13/18
0 1/2 1/2
1 1 0
010 ... ... ... 13/18
011 1/3 2/3 13/18
0 0 1
1 1/2 1/2
100 ... ... ... 13/18
101 ... ... ... 13/18
110 ... ... ... 13/18
111 ... ... ... 1

We can conclude some properties of our measure:

r(s) = r(reverse(s)) = r(bitwisenot(s)) = r(cyclicshift(s,q))

Using a computer, we can search for most random sequences for given N:

-------
N = 2, best r = 0.5

01
10
-------
N = 3, best r = 0.722222222222

001
010
011
100
101
110
-------
N = 4, best r = 0.785714285714

0011
0110
1001
1100
-------
N = 5, best r = 0.826666666667

00101
01001
01010
01011
01101
10010
10100
10101
10110
11010
-------
N = 6, best r = 0.862745098039

000101
001010
010001
010100
010111
011101
100010
101000
101011
101110
110101
111010
-------
N = 7, best r = 0.880045351474

0001001
0010001
0010010
0100010
0100100
0110111
0111011
1000100
1001000
1011011
1011101
1101101
1101110
1110110
-------
N = 8, best r = 0.89494047619

00100101
00101001
01001001
01001010
01010010
01011011
01101011
01101101
10010010
10010100
10100100
10101101
10110101
10110110
11010110
11011010
-------
N = 9, best r = 0.909964726631

000010001
000100001
000100010
001000010
001000100
010000100
010001000
011101111
011110111
100001000
100010000
101110111
101111011
110111011
110111101
111011101
111011110
111101110
-------
N = 10, best r = 0.916993464052

0010010101
0010101001
0100100101
0100101010
0101001001
0101010010
0101011011
0101101101
0110101011
0110110101
1001001010
1001010100
1010010010
1010100100
1010101101
1010110110
1011010101
1011011010
1101010110
1101101010
-------
N = 11, best r = 0.926404958678

00010001001
00010010001
00100010001
00100010010
00100100010
01000100010
01000100100
01001000100
01101110111
01110110111
01110111011
10001000100
10001001000
10010001000
10110111011
10111011011
10111011101
11011011101
11011101101
11011101110
11101101110
11101110110
-------
N = 12, best r = 0.929256854257

001010010101
001010100101
010010100101
010010101001
010100101001
010100101010
010101001010
010101101011
010110101011
010110101101
011010101101
011010110101
100101001010
100101010010
101001010010
101001010100
101010010100
101010110101
101011010101
101011010110
101101010110
101101011010
110101011010
110101101010

-------
N = 13, best r = 0.935133136095

0010010100101
0010100100101
0010100101001
0100100101001
0100101001001
0100101001010
0101001001010
0101001010010
0101101011011
0101101101011
0110101101011
0110101101101
0110110101101
1001001010010
1001010010010
1001010010100
1010010010100
1010010100100
1010110101101
1010110110101
1011010110101
1011010110110
1011011010110
1101011010110
1101011011010
1101101011010

The python script used to generate the output follows:

def nextBin(seq):
  pos = len(seq) - 1

  while pos >= 0:
    if seq[pos] == 0:
      seq[pos] = 1
      break

    seq[pos] = 0

    pos -= 1

def randomness(seq):
  totalSubsequences = 0
  probabilitySum = 0.0
    
  possibleSubseqs = 1

  for i in range(len(seq) - 1):       # for all subsequence lengths
    subseq = [0 for j in range(i)]

    for j in range(possibleSubseqs):  # for all subsequences of length i
      followingOnes = 0
      followingZeros = 0

      for k in range(len(seq)):       # find all occurrences of subsequence
        matches = True

        for l in range(len(subseq)):
          if subseq[l] != seq[(k + l) % len(seq)]:
            matches = False
            break

        if matches:
          if seq[(k + i) % len(seq)] == 0:
            followingZeros += 1
          else:
            followingOnes += 1

      if followingOnes + followingZeros > 0:
        totalSubsequences += 1
        probabilitySum += (followingOnes if followingOnes > followingZeros else followingZeros) / float(followingOnes + followingZeros)

      # create next sequence

      nextBin(subseq)

    possibleSubseqs *= 2

  return probabilitySum / float(totalSubsequences)

for n in range(2,14):
  s = [0 for i in range(n)]

  bestRandomness = 1.0
  bestSequences = []

  for i in range(2 ** n):
    r = randomness(s)

    if r == bestRandomness:
      bestSequences.append(list(s))
    elif r < bestRandomness:
      bestRandomness = r
      bestSequences = [list(s)]

    nextBin(s)

  print("-------")
  print("N = " + str(n) + ", best r = " + str(bestRandomness))
  print("")

  for a in bestSequences:
    st = ""

    for d in a:
      st += "0" if d == 0 else "1"
    
    print(st)