Friday, October 21, 2016

A deep dive into Pi

1. Contact

In 1985, Carl Sagan wrote a liberal-utopian novel called 'Contact'. Ellie (Jodie Foster in the film) traverses a wormhole where she meets wise, benevolent aliens. Returning to Earth there is a wormhole malfunction, all records are erased and she is not believed.

A discredited and broken women, she holes up in an apartment, recalling the confirmatory information she received from the alien,
"Ellie works on a program to compute the digits of π to heretofore-unprecedented lengths. [...] When Ellie looks at what the computer has found, she sees a circle rasterized from 0s and 1s that appear after 1020 places in the base 11 representation of π. This gives her a way to convince the world of something greater—that intelligence is built into the universe itself."
When I read this my suspension of disbelief collapsed entirely. As a matter of mathematics, the value of π is fixed, it's not a variable which can be programmed by any passing alien!

I was right .. but also wrong.

---

2. Dystopias

I prefer my alien first-contacts to be more dystopian:

---

3. How Deep to Dive?

Suppose you have a conventional six-sided die. On average, how many trials do you need to throw a three (or any other number)?

If you have any sequence of trials where the probability of success per trial is p (1/6 in the die example) then on average you need 1/p trials to get the first success.

This is not entirely obvious.

Take a coin. Keep tossing it. How many throws on average before you get a head? Here are some possible sequences: H, TH, TTH, TTTH, and so on. The longer sequences have lower probability. On average you need two throws - one divided by a half.

The number of trials is a random variable and its distribution is called the Geometric distribution.*

As apparently a normal number, the decimal representation of π is believed to have the features of a random sequence. We imagine producing the digit-sequence of π from the set {0, .., 9} essentially by choosing at random - there is no systematic pattern.

If we consider any particular digit, for example 7, 'on average' how deep do we need to search to find it in the π-sequence?

The probability of a digit of π being 7 is 1/10 so 'on average' we should expect to search down 10 digits before finding a 7.

Here are the first fifty digits of π. We start counting after the decimal point.

3. 14159 26535 89793 23846 26433 83279 50288 41971 69399 3751

And here are the depths of the different digits

0 -> 32  1 -> 1  2 -> 6  3 -> 9  4 -> 2

5 -> 4  6 -> 7  7 -> 13  8 -> 11  9 -> 5

Average = 9 deep.

Suppose we want to find a longer digit sequence in π. For example, 42. On average, how deep do we need to dive? Here's how to think about it. The probability of a digit at a given location in π being 4 is 1/10, but then the next digit, 2, also needs to match. So the combined probability of a two-digit match at any point in π is 1/100. So on average we need to search to a depth of 100 digits.

Here's the interesting site I'm using - The Pi-Search Page - and it tells me that:
"The string 42 occurs at position 92. This string occurs 2 000 419 times in the first 200M digits of Pi counting from the first digit after the decimal point. The 3. is not counted."
For three digits, you would expect to search 1,000 digits of π deep. The Pi-Search Page lets you search to 200 million digits of π, which is ~ 108, so strings of eight digits then.

You can have a lot of fun searching for meaningful numbers buried deep within π. Take a naive coding where a=1, b=2, c=3, ..., z=26. Then

Clare = 3 12 1 18 5:
"The string 3121185 occurs at position 1 785 482. This string occurs 21 times in the first 200M digits of Pi"
Nigel = 14 9 7 5 12:
"The string 1497512 occurs at position 21 024 008. This string occurs 24 times in the first 200M digits of Pi."
My wife and myself each get our names encoded in 7 digits so we would expect an average search depth in π of around 10 million. We both kind of straddle that depth (see note at end).

Enough of this vanity! People tend to put in their birthdates (ddmmyy = one million depth) but be aware, web sites and apps allow this to be reverse-engineered.

---

4 Circles in π

Let's take the simplest possible raster image of a circle. Each pixel is white (0) or black (1) and we have three rows of three bits. This can be written 010 101 010 or in octal, 252.

As we have three digits which we choose to interpret as base 10, how deep should we expect to look in π for this sequence, this three bit by three bit raster of a circle?

Obviously 1,000 digits.

Back to the Pi-Search Page.
"The string 252 occurs at position 822. This string occurs 200 446 times in the first 200M digits of Pi."
OK, the aliens have spoken - the circle is there. They truly exist. I was wrong.

This is a pretty pathetic circle, though. In 'Contact' it was described as 'a circle rasterized from 0s and 1s that appear after 1020 places in the base 11 representation of π'.

Let's handwave here a little. Assume that in base 10 this might be 1021 or thereabouts decimal digits. How long was the raster sequence to be found at such a depth? Around 21 decimal digits (which we could interpret as octal without too much effort). Multiple by three to get bits and we have 63-64 bits encoding the raster sequence.

So Ellie (Jodie Foster) gets to see an 8-bit by 8-bit picture of a circle. I think that would look very pretty and would definitely convince me that aliens (or God) exists.

By a suitable encoding any information can be mapped into a decimal digit string. If the length of the string is n digits, then on average we have to search to a depth of 10n digits in π to find said string.

But with probability 1 it's in there somewhere (.. it is generally believed but there is no proof!).

---

5. Searching π on your own device

I thought for one self-indulgent moment that it would be cute to write a program in Prolog (weird choice!) to compute π to zillions of digits and search for longer and more interesting digit-sequences. I even researched it a bit.

But it's all been done.

I downloaded the 'RealPi Benchmark' app onto my Nexus 10 and it computed ten million digits of π in about five minutes - which I was then able to search.

*Humbled*

---

* Statistics Note

The variance of the Geometric distribution is (1-p)/p2.

For the long digit strings we have been considering, p is very small so (1-p)/p2 is approximately 1/p2, and the standard deviation is just 1/p which is the same as the mean.

So revisiting the depth of Clare's and my names above, (7 digits), the mean number of trials to find our codes was ten million, and the standard deviation was also 10 million. So a +/- confidence interval of 1 std. dev. would be from zero to twenty million. Quite wide, but consistent with the values we actually saw.

I think that's why the Pi-Search Page computes 200 million digits of π.