Nice!¶

Do you know what's nice? I do! It's when both the square and the cube of a number have every digit between them exactly once. It just makes me feel all warm and fuzzy inside. Mostly because the only known example of such a number is 69. See, look, no repeats:

  • 692 = 4761
  • 693 = 328509

Wait, 69 is the only known example? How can we let that stand? This property is so nice, we must find another! Only then will teenagers have a second number to giggle at!

Now that I've got that unfunny introduction out of the way, you should probably just go read Jacob's excellent post over at Chromatic Conflux. Go on ahead, I'll wait.

Back? Cool. So I read that post and was all like "oh, that's neat, it probably speaks more to how humans are willing to find patterns in any chaotic system than any intrinsic property of math, yadda yadda" until I got to this bit:

But once we get to 148, b/(e^5) goes above 1, and then the expected value starts going to infinity.

Wait, you're telling me there are probably an infinite amount of nice numbers out there? And we only know about one of them? The rest are just waiting out there, waiting in the dark for some fool to stumble upon them...

Chapter 1. How I embarked upon my search for square-cube pandigitals with the power of math¶

The first step in my journey was realizing that I'm bad at math.

Chapter 2. How I embarked upon my search for square-cube pandigitals with the power of violence¶

Okay, I'm only kidding a little. I had a couple ideas that I thought might be helpful. I figured the most computationally expensive parts of the check would be:

  1. figuring out what base(s) each number is valid in (i.e. has the right number of digits in the square and cube)
  2. squaring and/or cubing the number
  3. converting the square and cube into the valid base

After a cursory search I was wrong on most counts. Jacob's original post (please go read it if you haven't already) has a truly beautiful algorithm for determining the range of valid numbers in each base. Even better, these ranges never overlap! That means each number will be valid in at most one base, and many numbers aren't valid in any base. This narrows our search field massively, and makes this item effectively free.

Squaring and cubing is still decently expensive, though not as bad as I had feared at first glance. Since each number is only valid in one base I don't need to worry about saving or caching the square/cube results, and using a lookup table would probably be more trouble than it's worth. There is a handy little trick if you only care about 100% nice numbers, which is that you can square a number and check the square for repeats before trying the cube. I'm not doing that yet, but I'm sure there are many possible optimizations along these lines.

Converting to other bases is still pretty expensive, but I got bored before I tried optimizing it. Note that numpy's base_repr only goes up to base 36, so you could probably just use that for a lot of what we're doing here (for now...).

Chapter 3. How I embarked upon my search for square-cube pandigitals with the power of silicon¶

First I ran it in Jupyter and it ate 60GB of RAM. Then I hooked up a database and it ate 60GB of disk space. Then I got smart about what data I wanted to save and it dropped to 100MB. Then I got smarter and it dropped to 10MB. Then I parallelized it by spinning up multiple docker containers (it wasn't really parallelized). Then I actually parallelized it. Then I made a backend service that sits between the search nodes and the database with an API for delegation. Then I started running more search nodes on another machine. Then I added some validation tests on the backend. Then I realized I had made a terrible mistake with my database, wiped it and started over. With the new database I added a couple fields I should have had the first time around and finally felt comfortable typing this up.

The last time I accidentally worked this late, I was playing Factorio.

Chapter 4. So why am I doing this again? What's my plan?¶

Well, I'm not totally sure. My engineer brain says there's a tradeoff here: if our goal is to find more nice numbers, we need to be looking somewhere north of base 100. But even if those fields are more fertile, there's so much more field to search. We need to be smart about where we look. So that's why I'm proposing the data & dakka dredge method.

Data: I run these scripts a while. I look at the numbers. Other people look at the numbers. We come up with some optimizations, some tricks for reducing the total range, or maybe some insight into "nice" numbers as a whole.

Dakka: Once I feel ready, I pick up my plow and start tilling the fertile fields of the high bases. (Does that make sense? I think my metaphor got away from me.) I'll find some rules and semi-randomly pick ranges or patterns to search in parallel. I'm being intentionally vague here because I really don't know what form these optimizations will take, if any.

Turns out I was wrong about the bases problem (entirely my fault). In the "low" bases, each number is more likely to be nice, but only in the "high" bases are there enough of them to be likely. So I'm replacing Dakka with Dredge.

Dredge: We just keep going. Our search will remain primarily sequential, with a little random search in the top end just for funsies statistical analysis. Eventually we'll find something!

Chapter 5. How I embarked upon my search for square-cube pandigitals with the power of a generic call-to-action¶

Woah, look at that seamless transition!

I've got about 40-50 processor cores at my disposal, but you've probably got some too. You can donate them to mine NiceCoin assist the search with this rust binary or this python script in a docker image, or make your own script to grab a range from the API and submit your findings!

If you're more into the math stuff, this page will auto-update with some visualizations of my data as I collect it. Scroll down for that juicy math meat.


Math Stuff!¶

I'm working in shades of gray here, so I expanded the definition of niceness slightly:

A number's niceness is equal to the proportion of unique digits in the square/cube concatenation to the total number of possible digits (i.e. the base). A number with niceness equal to 1 is "nice".

With this definition we can ask questions like, "what's the average niceness of all valid numbers in base 32?" or "are numbers with a high niceness disproprtionately odd?"

When I process a range of numbers, I collect two main datasets:

  1. The distribution of unique digits in the square/cube concatenation ("sqube") for each number in the search range
    • This can easily be converted into a niceness distribution for each range or base
  2. A list of numbers with a niceness greater than/equal to 0.9 (and the number of unique values in their sqube)
    • Originally I was only keeping numbers that were off by one or two, but the higher bases have decreasingly many.

You can download the database and the source to this notebook here (note this may be cached and slightly out of date).

BaseNiceness Mean StDev
1065.283% 12.22%
1259.505% 11.53%
1363.799% 9.65%
1462.553% 9.09%
1563.061% 8.86%
1763.314% 8.25%
1862.779% 8.0%
1963.55% 7.5%
2063.222% 7.36%
2263.228% 6.93%
2363.703% 6.71%
2462.873% 6.68%
2563.499% 6.42%
2763.331% 6.22%
2863.465% 6.03%
2963.694% 5.9%
3063.344% 5.82%
3263.284% 5.64%
3363.531% 5.51%
3463.532% 5.43%
3563.549% 5.34%
3763.489% 5.19%
3863.545% 5.12%
3963.479% 5.05%
4063.404% 5.0%
4263.31% 4.88%
4363.546% 4.8%
4463.396% 4.75%
4563.427% 4.69%

This is a set of histograms for each base, representing the niceness of each number. They seem to be getting more focused around a midpoint somewhere around 0.63 (Yev points out that this is very close to 1-1/e, which implies the distribution is truly random).

Below is just a plot of discovered numbers with a niceness over 0.9. You can almost see arcs of the off-by-ones, off-by-twos, etc.

Base Largest & NicestNiceness Discovered By
10 69100.0% kalak
15 249493.3% kalak
17 927894.1% kalak
20 14965295.0% kalak
22 63063590.9% kalak
23 130950491.3% kalak
24 185221595.8% kalak
25 943753692.0% kalak
27 4243086792.6% kalak
28 9017077592.9% kalak
29 17662159993.1% kalak
30 66510809293.3% celebrimbor
32 304632321193.8% celebrimbor
33 721758217393.9% celebrimbor
34 1601398047894.1% celebrimbor
35 6331740359794.3% celebrimbor
37 29981853282694.6% demoux
38 69390101492894.7% celebrimbor
39 154959269334994.9% celebrimbor
40 413493198370897.5% demoux
42 3136523786891395.2% demoux
43 7548600275210095.3% celebrimbor
44 11815600163085095.5% demoux
45 46963511946894995.6% demoux

The below table is a quick attempt to search for patterns in the near misses. For the divisor in each row, I took the entire corpus of near msises (niceness >= 90%) and tallied the modulo in each column. In the first row, we can see that the near misses are (roughly) eqaully even vs odd. Rows past that check the same idea with different divisors.

Divisor 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
2157829163337
3100226104320116620
4 77708 81197 8012182140
5 61909 61917 656726625165417
6 49285 52527 56751509415179359869
7 38776 39027 4844448090495074850948813
8 39233 39276 400774144438475419214004440696
9 33596 34124 38727334033538138866332273481539027
10 29719 31430 3206233348326583219030487336103290332759
11 26498 25697 299623011229980302113003229900303502999128433
12 24115 26333 28649260562549129979251702619428102248852630229890
13 24833 24735 2448724504249062470024335247692481624720247112482324827
14 18881 19801 229412338024310249972424919895192262550324710251972351224564
15 19063 20104 23440207832112422778189842141723861205812006822829208152160723712
16 19498 19729 1995020553191242090720192205771973519547201272089119351210141985220119
17 18698 18906 190161888918823190481877118922186861884018765190231899218763190971886419063
18 16515 17183 18859170271778120075163941774419101170811694119868163761760018791168331707119926
19 16977 16799 1707717015168881691116760169621679716658169311679616919169501703016828170031703216833
20 13508 15409 154411665916387155741628316486169881640116211160211662116689162711661614204171241591516358
21 10004 11281 15986149061661815871154821078717038163761663416484163451662517985107081608216550164051629316706
22 12416 13154 1456815400147931541014704150621488015302148851408212543153941471215187148011532814838154701468913548
23 13966 14010 141861387513952140081412713940139441407113973137471393813964140471397813990139191370113785140471395214046
24 12141 12800 14316131061262115468126381297014222119651322915114119741353314333129501287014511125321322413880129201307314776

Search data¶

Everything below this point is meta-information about the search itself.

For instance, this chart shows the progress we've made in searching the valid range of each base. Note that the above statistics (even per-base statistics) can show results from incomplete bases, so check here to see how much of a base has been searched to see if a sample might or might not be representative.

These charts take the total searched range and divide it by the total processing time (from when a range was checked out to when it was returned) to get a per-search processing speed. We can also isolate this by user, or just sum the ranges they've completed to find their total contribution.