I’m working on a statistics project involving cards and shuffling, and I’ve run across an issue with random number generation.

From a simple bit of math there are 52! possible deck permutations, which is approximately 2^226. I believe this means that I need a random number generator with a minimum of 226 bits of entropy, and possibly more (I’m not certain of this concept so any help would be great).

From a quick google search, the `Math.random()`

generator in Java has a maximum of 48 bits of entropy, meaning that the vast majority of possible deck combinations would not be represented. So this does not seem to be the way to go in Java.

I was linked to this generator but it doesn’t have a java implementation yet. Also for a bit of context here is one of my shuffling algorithms (it uses the Fisher-Yates method). If you have any suggestions for better code efficiency that would be fantastic as well.

```
public void shuffle(int type, int swaps){
int[] newDeck = getNewDeck();
if(type == 1){
for(int i = 0; i < 52; i++){
int nextCardIndex = (int)(Math.random()*newDeck.length);
deck[i] = newDeck[nextCardIndex];
newDeck = removeItem(nextCardIndex, newDeck);
}
}
}
public int[] getNewDeck(){
int[] newDeck = new int[52];
for(int i = 1; i <= 52; i++){
newDeck[i-1] = i;
}
return newDeck;
}
public int[] removeItem(int index, int[] array){
int[] newArray = new int[array.length-1];
for(int i = 0; i < index; i++){
newArray[i] = array[i];
}
for(int i = index; i < array.length-1; i++){
newArray[i] = array[i+1];
}
array = newArray;
return array;
}
```

5

## 2 Answers

Short answer:

- You have confused uniformity and entropy.
- Java’s built in code is already mathematically perfect. You can’t improve on these with a different pseudo-random number generator.
`Collections.shuffle(deck)`

is all you need. Really. It’s perfect. Uniform, unpredictable.

But, don’t take my word for it. I’ll prove it.

## Uniformity and entropy – not the same thing at all.

I’m working on a statistics project involving cards and shuffling, and I’ve run across an issue with random number generation. From a simple bit of math there are 52! possible deck permutations, which is approximately 2^226. I believe this means that I need a random number generator with a minimum of 226 bits of entropy,

You’re confusing concepts. There’s uniform distribution, which can be done perfectly well with 0 entropy, and unpredictability, which requires entropy. ‘226 bits of entropy’ is **utterly unrelated** to the idea that you have about 2^226 different permutations available. The sequence of outputs of an algorithm doesn’t become more or less predictable based on the # of choices that any given ‘output’ of the algorithm can provide.

## Okay, so what’s permutation uniformity?

Let’s stick with die rolls, that’s a bit more workable than deck shuffling. We have your standard 6 sided die, and we write an algorithm that rolls it. Our first attempt is this:

Think about random algorithms as a black box. Let’s say you have no idea what the implementation is, you simply know: I can call it, a random number falls out.

If you call this once, **you have no idea** that it is broken. It really isn’t – it gave a number, there was no way to know without knowing that code. It’s perfectly random and perfectly unpredictable.

But, call it **more than once** and now its gets interesting. Pretty soon you start being able to realize that one number is generated rather a lot more than the other. If you run this algorithm 1000 times and you make a histogram, you notice that you got a thousand ‘4’s and zero 1/2/3/5/6. This shows that the algorithm appears to be highly **non-uniform**, it does not provide each number equally often. That or you got incredibly unlucky, that’s always possible (if you actually roll a die one thousand times, once in a bazillion billion billion years, it’ll roll a 4 every time a thousand times in a row, by pure happenstance. Any algorithm that is incapable of theoretically doing that is *less* random than a truly random source!)

But here’s another algorithm:

```
private int roll = 1;
int roll() {
return roll = (roll + 1) % 6;
}
```

Invoke this one a whole bunch of times and that histogram is **perfect**. Each number shows up equally often. It is entirely **uniform** – no one number is more likely than any other. If I play a game with you where you predict what I will roll, and if you predict correctly I give you money, but you don’t know how often I’ve called that method before, and you only get to play once, you can’t do any better than random chance.

Contrast to our earlier `return 4;`

, take, where you can predict ‘4’ and you’ll get cash.

There’s clearly 0 entropy here, and yet it is perfectly uniform. Hence, to get uniformity (make it equally likely that continually invoking a ‘shuffle this deck!’ method will produce all possible ways to shuffle a deck, and doesn’t have any particular order be particularly more or less likely if only you run it long enough) – we got there with **0 entropy**. It’s obviously extremely predictable, but that’s a different issue.

Let’s go for the interesting one:

```
int roll() {
return 1 + (Math.random() * 6);
}
```

This seems fine. **But it is not, and I can prove it**. It cannot possibly be perfectly uniform, mathematically.

`Math.random()`

produces ‘any number between 0 and 1’, and, forgetting about computers for a while, there are an infinity of numbers in that range. But, we notice that the `Math.random()`

method returns a `double`

, and the java spec says these take up 64 bits. Obviously, you can’t store a choice amongst an infinite space in 64-bits; you’d need infinite bits instead. Also, `double`

can represent numbers that `Math.random()`

cannot return (such as `2.0`

; there is a bit sequence that is a valid double and has value 2.0, and `Math.random()`

will never return it, as it isn’t between 0 and 1), so that’s how you get to about 48 bits of ‘variance’. It’s not ‘about 48 bits’, it’s exactly 48 bits (but even if it isn’t, the following proof works):

48 bits is 2^48.

That means that our algorithm only has 2^48 different potential outputs! We can make a really, really, really humongous table, that explicitly lists all the 2^48 different inputs, and write out what the randomly rolled number would be. This table would have **exactly** 2^48 entries in it. That means **Exactly** 2^48 times a 1, 2, 3, 4, 5 or 6.

Here’s the problem then: `2^48/6`

**is not an integer number**. Obviously – 2^48 is divisible only by 2 (48 times in fact), and 6 is `2*3`

in prime terms – obviously it won’t divide cleanly. That table has 281474976710656 entries in it, which means, best case scenario, 4 of the 6 numbers show up 46912496118443 times, and 2 of them show up 46912496118442 times.

It’s a tiny, tiny difference, but the above is **proof** that `Math.random()`

cannot possibly be perfectly uniform. Unlike our method that just returns 1-2-3-4-5-6 in a sequence.

Obviously if you care about proper randomness, you **do** want that uniformity. It doesn’t matter how many bits of entropy are powering `Math.random()`

here. That’s not the problem at all.

So what’s the fix? Use the right API. `java.util.Random`

is **perfectly** uniform if you use the right methods. Thus, our 4th and finally usable take on this method:

```
Random r = new java.util.Random();
public int roll() {
return 1 + r.nextInt(6);
}
```

The API docs of `nextInt`

spell it out for you: It guarantees perfect uniformity.

## So what’s predictability then?

Our method that returns 1-2-3-4-5-6 in a loop is perfectly predictable. Let’s assume that we play our game again: You predict a number, I roll once, if it is that number you get a lot of money. But now let’s add a wrinkle: Before you tell me the number you want, you can ask me to roll the die and tell you the result. You may ask this of me as many times as you please, you can write them all down and think about it, and you can ask me not to ever roll the dice except for you (So, when you ask, or when we do the money roll).

For that 1-2-3-4-5-6 algorithm, you will win every time. You just ask me to roll a few times, you realize it’s 1-2-3-4-5-6-1-2-3-4 endlessly, and predict perfectly. So, that one is perfectly uniform, but also easily predicted based solely on watching the data.

This is where you need entropy. Entropy is about introducing factors that you **do not** know about. Those unknown factors are entropy. The more entropy the more numbers you need to see. For example, if I am allowed to flip 5 coins and I don’t need to tell you the results, and I get to apply the results of these flips to my die rolls, I can say that I will add 1 to the first number I tell you *if* the first coin was heads. I add 1 to the second number I tell you *if* the second coin was heads, and so on. On the 6th roll I ran out and I need to re-use the first coin. I’m a computer and have no hands; a CPU has no coin in it, I can’t flip more coins.

In this case, you need to ask me for the result a few times before you can predict it – you need to figure out the coin flips first. But, you will, once you ask me for 5 rolls. Then you’re back to perfect prediction.

That’s entropy.

The trick is, **computers generate entropy all the time**. Imagine this alternate algorithm: It’s still the simplistic:

```
int roll = 1;
public int roll() {
return roll = (roll + 1) % 6;
}
```

but this time, one thousand players are all playing this game with me, they’re humans (so their speed of saying: “Roll for me please!” is not quite uniform or predictable, humans being very complex machines, many many billions of cells all interacting, too difficult to analyse) – so now actually even though my algorithm seems utterly idiotic, the simple fact that you don’t get consecutive rolls, but instead rolls with an unknown and unpredictable number of people that asked before you, we’re back to perfection: You can’t win more than 1 in 6 games here.

**And this is exactly how your OS works**. It uses ‘hey, somebody wants a random number’, itself, as a source of entropy. In fact, it also injects anything hard to predict into the system. Imagine you *also* know that I roll a die for kicks (as in, I move one ahead in my sequence of 1-2-3-4-5-6-1-2-…) every second exactly. You can take that into account so this is useless, but it doesn’t **reduce** the entropy. You can only add, assuming you write your algorithm correctly. The OS knows this and adds as much as it feels like. It’ll add keyboard input and mouse movement. Even if the keyboard is actually a robot that presses space every second, on the dot, exactly – okay, well, no entropy there then. That doesn’t make it worse. (It’s a real ‘hey, if it won’t help at least it won’t hurt’ scenario).

Because of this:

- The computer is continually receiving boatloads of entropy! The timing of network packets rolling into the network card, keyboard and mouse input, and the entropy inherent in all sorts of apps running on the system asking for random numbers from the OS (and java’s random stuff such as
`Math.random()`

and`j.u.Random`

just ask the OS). Even if the computer somehow ‘ran out’ of entropy, in a few milliseconds it’ll have gathered up some more. - Randomness is like an ever-filling pool. If you’ve let the computer run for a few hours, mousing around a bit, you’ve been building up entropy all that time. It stacks up. You’d have to ‘drain’ this entropy out before you’re in trouble.
- PRNGs do not solve anything here, they can’t. If truly your computer ‘ran out’ of entropy (which is extremely unlikely because it has so many sources), no PRNG can save you. In the end, computers just do not have a coin
^{note 1}, and can’t generate entropy out of nothingness. No algorithm you can imagine can change that. At best they can detect that we ran out of entropy and simply cease giving you random numbers until it finds some entropy^{note 2}.

Yes, you need at least 226 bits of entropy; if you have less, then your ‘shuffle this deck’ code is provably predictable in the sense that you will be able to eliminate at least some of the possible deck orders. But your system has many many many millions of bits of entropy on tap for you, you’re not going to run out. But if you do, a PRNG doesn’t magically fix that.

## The right way to do this

Thus, we get to the point of it all: If we play our dice prediction game, my aim is to set it up as perfectly as possible for me, which is: No matter how smart you are, nothing you can imagine can do statistically better than winning one in every 6 games.

To accomplish this, I need **both** *uniformity* and *unpredictability*:

- With that 1-2-3-4-5-6 thing, you can just predict and win that way. It’s uniform, but, predictable.
- With an algorithm that, say, is truly and utterly unpredictable in every way imaginable, perfect entropy, but, it simply is twice as likely to return 1 or 2. Because it’s the world’s most perfect die and the world’s most random die roller, but it rolls an
**8**sided die, and returns that % 6 (so on a 7 it says ‘1’, and on an 8 it says ‘2’). In this case, you can win once every**4**games which is better than once every 6 (predict 1 every time; you win if I roll a 1 or 7 on a perfect 8 sided die, that occurs once every 4 rolls on average).

So how do you do that in java? Trivial:

```
Collections.shuffle(deck);
```

That was it. That ridiculous little one liner does it all.

- It is perfectly uniform. As the docs say,
`shuffle`

uses Fisher-Yates, Fisher-Yates passes the pigeon hole principle test (the # of permutations of the random sources it gets is divisible by X! where X is the size of the list. The wiki entry on F-Y has the full proof). - It uses
`java.util.Random`

which is generally fine.

There is also `SecureRandom`

which tries to do a few things to reduce predictability. For crypto purposes (such as generating encryption keys), you should use that. For a server that runs a poker game, even if it’s for big bucks, `java.util.Random`

is fine. Use SecureRandom if you must, but I don’t think there’s any way to ‘break’ that other than schemes which are vastly more complicated than other ways to steal your cash.

## footnote 1

You can get custom hardware that can flip coins. Generally these are radios that pick up cosmic background radiation. There’s also tricky ways that a computer can ‘detect’ cosmic background effects on its own standard hardware. I’m not sure if chips ship with it, I think some do, and your OS will initialize its random ‘pool’ using the entropy the chip provides. Of course, a computer that gets no human interaction or interaction from unpredictable sources such as internet traffic, that is a really well isolated room in a faraday cage and the like, there is simply no entropy here, and you can’t make entropy out of nothing.

## footnote 2

On linux `/dev/urandom`

gives you endless randomness, but `/dev/random`

will start ‘blocking’ (waits, like a slow file, until entropy shows up). Some misguided documentation therefore suggest you use /dev/random for crypto. But this is bad advice: It’s not actually possible for a computer to know if some nugget of evident entropy is actually entropy. If the keyboard is a virtual keyboard and some robot just presses space every second, and everybody knows it, then this adds no entropy. But the computer does not know this. It could try to ‘analyse’ the input and realize it seems like there’s a pattern to it, but sometimes if you flip a coin ten times, it’s heads every time. That doesn’t make that any less random. If you eliminate such patterns you’re actually **less** random.

Hence, just use `/dev/urandom`

. Hence, `java.util.Random`

is generally fine.

## One sidenote

Virtual PCs (e.g. docker, kubernetes, etcetera) can really really run out of entropy. Its emulated chips are not always properly ‘written’ and the baked in entropy initialization of the chip just returns zeroes, the network is predictable, it’s all scripted and firewalled, so literally zero entropy flows into the system.

That was a real problem of early takes on the virtual PC image concepts. But these days, all the various implementations out there ‘pass entropy’ through from the host OS to the virtual PC, and e.g. the linux driver that manages `/dev/urandom`

and `/dev/random`

take it into account.

Seriously, don’t worry about it. Use `Collections.shuffle`

, use e.g. `rnd.nextInt()`

. Don’t use `Math.random()`

for anything. Except if you really really need exactly a number between 0 and 1.

5

Wow, I can’t thank you enough. This is really great and helped me understood the concepts a lot better. One question I have, is if I was somehow using one Math.random() output (one double value that is) to generate the entire deck, I assume this would be an issue (since there are more possible deck permutations than outputs). But if I’m generating the deck one card at a time, is Math.random() perfectly acceptable because there are at most 52 possible choices at each step?

–Also following the process in my last comment, is there any risk that I could get the same deck twice because the first output was identical in two cases? Say for example I called Math.random() for the first card and got a result of 0.387639234 or so. Then let’s say later I called Math.random() for the first card again and got 0.387639234 as well. Would the entire generated deck be the same in both cases, or does the generator use the built up system entropy to ensure this doesn’t happen? I will use the Collections.shuffle(deck) method but I am also running some of my own programmed shuffling.

–No, using

`Math.random()`

is almost always a bad idea and certainly is so here. You*CAN*write this with just Math.random(), but you’re going to have to do the very tricky (and very hard to test!) work to avoid the pigeonhole principle issues (making each deck shuffle permutation exactly equally likely – perfect uniformity. Harder than it looks). If the PRNG randomly emits the exact same sequence, you get the exact same deck. This is fundamentally unsolvable if there isn’t enough entropy in the system. It will happen from time to time. Specifically, about once in every 52! times you shuffle.I don’t believe I will run out of entropy. Most of my tests will just be using around a 1-10 billion samples, which is more than enough information for my tests. This means over all of my tests most likely I will be shuffling no more than a trillion times over a day or two, far less than 52!. Am I correct that that I will not run out of entropy?

–The algorithm gives you uniformity even if you have

**0 entropy**. It’ll still hit every deck equally likely. It’s just that it becomes predictable. But, you won’t run out. Even if you only have a measly 200 bits of entropy*total*, you’d need to observe a ton of your deck shuffles, and do a ton of math, to then be able to predict your further shuffles.

Have you looked into the recent additions that are included in JDK 17?

There are plenty of algorithms available:

For shuffling cards you likely don’t need something that is cryptographically secure.

Using Collections.shuffle should do the trick if you provide a decent RNG.

2

Indeed, this. Funnily enough, the page specifically says

*“For example, if the goal is to shuffle a deck of 52 cards, the number of possible permutations is 52! (52 factorial), which is approximately 2^225.58, so it may be best to use a generator whose period is roughly 2^256 or larger, such as*.`L64X256MixRandom`

,`L64X1024MixRandom`

,`L128X256MixRandom`

, or`L128X1024MixRandom`

.”This doesn’t solve any of your problems, @Caedmon. Alternative PRNGs do not fix

`Math.random()`

issues, nor do they make your shuffling algorithms any better.

Use docs.oracle.com/javase/8/docs/api/java/security/…

“I was linked to this generator but it doesn’t have a java implementation yet.”– You could always write one yourself. I mean … one of the goals your project will (surely) be to improve your Java programming skills. Implementing a PRNG in Java would help with that.@StephenC Writing a PRNG program in Java would be a cool side project, but my main goal of the project is to analyze different techniques for quantifying how well shuffled a deck of cards is. I’m going to write a paper on it looking at many methods from entropy to counting inversions (my original post explains better here: math.stackexchange.com/questions/4354898/…). As such I wouldn’t want to risk botching the core of the project and rather play it safe. The link you provided looks very useful, thank you.

…I believe this means that I need a random number generator with a minimum of 226 bits of entropy…No, you do not need that many bits of entropy.You would need that much entropy if you were picking a shuffling based on its serial number. But you’re doing it differently, so you’re fine. I can only recommend using

`ThreadLocalRandom.current().nextInt(newDeck.length)`

instead of`Math.random`

.