thanksgiving pi

or, trying to predict the next value of pi

Zev Burton
November 26, 2024

hi,

A while back, I got an idea stuck in my head and I couldn’t get it out: can we predict the next value of pi (π)? With the past couple of weeks we’ve had, I’ve needed a distraction — so I set out to answer it.

This, of course, leads down a thousand other rabbit holes, such as how we calculate pi to begin with (and how we know we are right), but for the sake of writing something I’m intentionally limiting this to just the question: can we predict the next value of pi?

My gut answer tells me no. Pi is the irrational number. However, once I got the idea, I had to code it out.

Thought One: Are all numbers equally represented?

In other words, is pi normal?

Normal means that every digit and every string of digits appears roughly the same fraction of time as all the others. 1 is just as likely to appear as 7; 36 just as likely as 92.

Proving this mathematically is exceptionally hard — it is still an open problem to this day. However, we can use statistics to check if each digit appears just as often as all the others. When we do this for the first 10,000 digits, we get the following:

Pretty even!

Granted, there is a bit of variability — while it seems like they are all roughly equal, using a Chi-Squared test confirms this.

(H₀: No significant deviation from uniform distribution. p-value: 0.4012)

Thought Two: Do the digits of pi feel random?

It’s not totally known if the digits of pi are random or not. Instead of going and getting a Doctorate in this, I decided on a proxy text that felt like it would give me a better feel of how difficult it would be to predict the next number of pi:

For every digit of pi, randomly select a digit from 0 to 9. Mark how often they are the same.

This will also be our base case for any models we build.

It could not be simpler. We would expect the match rate to average around 10% — with no bias, each random digit is equally likely to match the corresponding digit of pi entirely by chance.

Run this 5,000 times, and you get the following:

It hovers exactly around that 10% mark.

Yep. a good ol’ 10%.

Over time, our running mean converges to 10%, and we now have our base prediction rate. Any model that we build must beat 10% consistently in order for us to consider this experiment a success.

Note: This makes intuitive sense, but I’m finding that it always helps to have some basis to check our intuition.

Thought Three: Let’s Throw Some Models at the Solution

There are a variety of different machine learning models that we can use here. Given that we are taking a string of numbers and trying to predict the next, a few models come to mind:

Random forest
Multinomial Logistic Regression
K-Nearest Neighbors
Support Vector Machines
Neural Networks

If we were going to ~ really ~ go for this (get AWS clusters, rent out a GPU, etc.), then I’d probably be a little more conservative in my model selection, but since I’m not doing that, we can attack all of these rather easily.

My approach is as follows:

Create a dataset where each row has X numbers (sequencing from 10 to 90 by 5). Using a moving window, we are going to have the first 8,000 numbers represented. As an example using X = 5, where we are trying to predict the bolded sixth number:
1. First row: 3, 1, 4, 1, 5 → 9
2. Second row: 1, 4, 1, 5, 9 → 2
3. Third row: 4, 1, 5, 9, 2 → 6
Training each of the models on the first 8,000 rows
Seeing if it works for the next 2,000.
See if any of these beat out our random guessing approach.

After a little waiting for everything to run:

Perhaps unsurprisingly, none of the models are significantly better (averaging across all of the window sizes).

Maybe one of the models did particularly well at a given window size?

The numbers on the right are the window sizes for the models that did better than 11%.

Ehhhh… not that great. Even the “good ones” aren’t that great. Getting 11.5% of the test cases right is a bit like winning $20 on a $20 lottery ticket — great, until you realize that you’re right back where you started.

Similar to other posts I’ve done, this is the inflection point of the article. Either I can keep searching, or I can just say that this number continues to astound me — it is truly one of the closest things we have to complete randomness.

I hope you all have a lovely Thanksgiving.

Zev

(p.s. code for this article is here.)