This page is outdated, please check out the most recent work at nicenumbers.net.
Do you know what's nice? I do! It's when both the square and the cube of a number have every digit between them exactly once. It just makes me feel all warm and fuzzy inside. Mostly because the only known example of such a number is 69. See, look, no repeats:
Wait, 69 is the only known example? How can we let that stand? This property is so nice, we must find another! Only then will teenagers have a second number to giggle at!
Now that I've got that unfunny introduction out of the way, you should probably just go read Jacob's excellent post over at Chromatic Conflux. Go on ahead, I'll wait.
Back? Cool. So I read that post and was all like "oh, that's neat, it probably speaks more to how humans are willing to find patterns in any chaotic system than any intrinsic property of math, yadda yadda" until I got to this bit:
But once we get to 148, b/(e^5) goes above 1, and then the expected value starts going to infinity.
Wait, you're telling me there are probably an infinite amount of nice numbers out there? And we only know about one of them? The rest are just waiting out there, waiting in the dark for some fool to stumble upon them...
The first step in my journey was realizing that I'm bad at math.
Okay, I'm only kidding a little. I had a couple ideas that I thought might be helpful. I figured the most computationally expensive parts of the check would be:
After a cursory search I was wrong on most counts. Jacob's original post (please go read it if you haven't already) has a truly beautiful algorithm for determining the range of valid numbers in each base. Even better, these ranges never overlap! That means each number will be valid in at most one base, and many numbers aren't valid in any base. This narrows our search field massively, and makes this item effectively free.
Squaring and cubing is still decently expensive, though not as bad as I had feared at first glance. Since each number is only valid in one base I don't need to worry about saving or caching the square/cube results, and using a lookup table would probably be more trouble than it's worth. There is a handy little trick if you only care about 100% nice numbers, which is that you can square a number and check the square for repeats before trying the cube. I'm not doing that yet, but I'm sure there are many possible optimizations along these lines.
Converting to other bases is still pretty expensive, but I got bored before I tried optimizing it. Note that numpy's base_repr only goes up to base 36, so you could probably just use that for a lot of what we're doing here (for now...).
First I ran it in Jupyter and it ate 60GB of RAM. Then I hooked up a database and it ate 60GB of disk space. Then I got smart about what data I wanted to save and it dropped to 100MB. Then I got smarter and it dropped to 10MB. Then I parallelized it by spinning up multiple docker containers (it wasn't really parallelized). Then I actually parallelized it. Then I made a backend service that sits between the search nodes and the database with an API for delegation. Then I started running more search nodes on another machine. Then I added some validation tests on the backend. Then I realized I had made a terrible mistake with my database, wiped it and started over. With the new database I added a couple fields I should have had the first time around and finally felt comfortable typing this up.
The last time I accidentally worked this late, I was playing Factorio.
Well, I'm not totally sure. My engineer brain says there's a tradeoff here: if our goal is to find more nice numbers, we need to be looking somewhere north of base 100. But even if those fields are more fertile, there's so much more field to search. We need to be smart about where we look. So that's why I'm proposing the data & dakka dredge method.
Data: I run these scripts a while. I look at the numbers. Other people look at the numbers. We come up with some optimizations, some tricks for reducing the total range, or maybe some insight into "nice" numbers as a whole.
Dakka: Once I feel ready, I pick up my plow and start tilling the fertile fields of the high bases. (Does that make sense? I think my metaphor got away from me.) I'll find some rules and semi-randomly pick ranges or patterns to search in parallel. I'm being intentionally vague here because I really don't know what form these optimizations will take, if any.
Turns out I was wrong about the bases problem (entirely my fault). In the "low" bases, each number is more likely to be nice, but only in the "high" bases are there enough of them to be likely. So I'm replacing Dakka with Dredge.
Dredge: We just keep going. Our search will remain primarily sequential, with a little random search in the top end just for funsies statistical analysis. Eventually we'll find something!
Woah, look at that seamless transition!
I've got about 40-50 processor cores at my disposal, but you've probably got some too. You can donate them to mine NiceCoin assist the search with this rust binary or this python script in a docker image, or make your own script to grab a range from the API and submit your findings!
If you're more into the math stuff, this page will auto-update with some visualizations of my data as I collect it. Scroll down for that juicy math meat.
This page is outdated, please check out the most recent work at nicenumbers.net.
I'm working in shades of gray here, so I expanded the definition of niceness slightly:
A number's niceness is equal to the proportion of unique digits in the square/cube concatenation to the total number of possible digits (i.e. the base). A number with niceness equal to 1 is "nice".
With this definition we can ask questions like, "what's the average niceness of all valid numbers in base 32?" or "are numbers with a high niceness disproprtionately odd?"
When I process a range of numbers, I collect two main datasets:
You can download the database and the source to this notebook here (note this may be cached and slightly out of date).
Base | Niceness Mean | StDev |
---|---|---|
10 | 65.283% | 12.22% |
12 | 59.505% | 11.53% |
13 | 63.799% | 9.65% |
14 | 62.553% | 9.09% |
15 | 63.061% | 8.86% |
17 | 63.314% | 8.25% |
18 | 62.779% | 8.0% |
19 | 63.55% | 7.5% |
20 | 63.222% | 7.36% |
22 | 63.228% | 6.93% |
23 | 63.703% | 6.71% |
24 | 62.873% | 6.68% |
25 | 63.499% | 6.42% |
27 | 63.331% | 6.22% |
28 | 63.465% | 6.03% |
29 | 63.694% | 5.9% |
30 | 63.344% | 5.82% |
32 | 63.284% | 5.64% |
33 | 63.531% | 5.51% |
34 | 63.532% | 5.43% |
35 | 63.549% | 5.34% |
37 | 63.489% | 5.19% |
38 | 63.545% | 5.12% |
39 | 63.479% | 5.05% |
40 | 63.404% | 5.0% |
42 | 63.31% | 4.88% |
43 | 63.546% | 4.8% |
44 | 63.396% | 4.75% |
45 | 63.427% | 4.69% |
This is a set of histograms for each base, representing the niceness of each number. They seem to be getting more focused around a midpoint somewhere around 0.63 (Yev points out that this is very close to 1-1/e
, which implies the distribution is truly random).
Below is just a plot of discovered numbers with a niceness over 0.9. You can almost see arcs of the off-by-ones, off-by-twos, etc.
Base | Largest & Nicest | Niceness | Discovered By |
---|---|---|---|
10 | 69 | 100.0% | kalak |
15 | 2494 | 93.3% | kalak |
17 | 9278 | 94.1% | kalak |
20 | 149652 | 95.0% | kalak |
22 | 630635 | 90.9% | kalak |
23 | 1309504 | 91.3% | kalak |
24 | 1852215 | 95.8% | kalak |
25 | 9437536 | 92.0% | kalak |
27 | 42430867 | 92.6% | kalak |
28 | 90170775 | 92.9% | kalak |
29 | 176621599 | 93.1% | kalak |
30 | 665108092 | 93.3% | celebrimbor |
32 | 3046323211 | 93.8% | celebrimbor |
33 | 7217582173 | 93.9% | celebrimbor |
34 | 16013980478 | 94.1% | celebrimbor |
35 | 63317403597 | 94.3% | celebrimbor |
37 | 299818532826 | 94.6% | demoux |
38 | 693901014928 | 94.7% | celebrimbor |
39 | 1549592693349 | 94.9% | celebrimbor |
40 | 4134931983708 | 97.5% | demoux |
42 | 31365237868913 | 95.2% | demoux |
43 | 75486002752100 | 95.3% | celebrimbor |
44 | 118156001630850 | 95.5% | demoux |
45 | 469635119468949 | 95.6% | demoux |
The below table is a quick attempt to search for patterns in the near misses. For the divisor in each row, I took the entire corpus of near msises (niceness >= 90%) and tallied the modulo in each column. In the first row, we can see that the near misses are (roughly) eqaully even vs odd. Rows past that check the same idea with different divisors.
Divisor | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 157829 | 163337 | ||||||||||||||||||||||
3 | 100226 | 104320 | 116620 | |||||||||||||||||||||
4 | 77708 | 81197 | 80121 | 82140 | ||||||||||||||||||||
5 | 61909 | 61917 | 65672 | 66251 | 65417 | |||||||||||||||||||
6 | 49285 | 52527 | 56751 | 50941 | 51793 | 59869 | ||||||||||||||||||
7 | 38776 | 39027 | 48444 | 48090 | 49507 | 48509 | 48813 | |||||||||||||||||
8 | 39233 | 39276 | 40077 | 41444 | 38475 | 41921 | 40044 | 40696 | ||||||||||||||||
9 | 33596 | 34124 | 38727 | 33403 | 35381 | 38866 | 33227 | 34815 | 39027 | |||||||||||||||
10 | 29719 | 31430 | 32062 | 33348 | 32658 | 32190 | 30487 | 33610 | 32903 | 32759 | ||||||||||||||
11 | 26498 | 25697 | 29962 | 30112 | 29980 | 30211 | 30032 | 29900 | 30350 | 29991 | 28433 | |||||||||||||
12 | 24115 | 26333 | 28649 | 26056 | 25491 | 29979 | 25170 | 26194 | 28102 | 24885 | 26302 | 29890 | ||||||||||||
13 | 24833 | 24735 | 24487 | 24504 | 24906 | 24700 | 24335 | 24769 | 24816 | 24720 | 24711 | 24823 | 24827 | |||||||||||
14 | 18881 | 19801 | 22941 | 23380 | 24310 | 24997 | 24249 | 19895 | 19226 | 25503 | 24710 | 25197 | 23512 | 24564 | ||||||||||
15 | 19063 | 20104 | 23440 | 20783 | 21124 | 22778 | 18984 | 21417 | 23861 | 20581 | 20068 | 22829 | 20815 | 21607 | 23712 | |||||||||
16 | 19498 | 19729 | 19950 | 20553 | 19124 | 20907 | 20192 | 20577 | 19735 | 19547 | 20127 | 20891 | 19351 | 21014 | 19852 | 20119 | ||||||||
17 | 18698 | 18906 | 19016 | 18889 | 18823 | 19048 | 18771 | 18922 | 18686 | 18840 | 18765 | 19023 | 18992 | 18763 | 19097 | 18864 | 19063 | |||||||
18 | 16515 | 17183 | 18859 | 17027 | 17781 | 20075 | 16394 | 17744 | 19101 | 17081 | 16941 | 19868 | 16376 | 17600 | 18791 | 16833 | 17071 | 19926 | ||||||
19 | 16977 | 16799 | 17077 | 17015 | 16888 | 16911 | 16760 | 16962 | 16797 | 16658 | 16931 | 16796 | 16919 | 16950 | 17030 | 16828 | 17003 | 17032 | 16833 | |||||
20 | 13508 | 15409 | 15441 | 16659 | 16387 | 15574 | 16283 | 16486 | 16988 | 16401 | 16211 | 16021 | 16621 | 16689 | 16271 | 16616 | 14204 | 17124 | 15915 | 16358 | ||||
21 | 10004 | 11281 | 15986 | 14906 | 16618 | 15871 | 15482 | 10787 | 17038 | 16376 | 16634 | 16484 | 16345 | 16625 | 17985 | 10708 | 16082 | 16550 | 16405 | 16293 | 16706 | |||
22 | 12416 | 13154 | 14568 | 15400 | 14793 | 15410 | 14704 | 15062 | 14880 | 15302 | 14885 | 14082 | 12543 | 15394 | 14712 | 15187 | 14801 | 15328 | 14838 | 15470 | 14689 | 13548 | ||
23 | 13966 | 14010 | 14186 | 13875 | 13952 | 14008 | 14127 | 13940 | 13944 | 14071 | 13973 | 13747 | 13938 | 13964 | 14047 | 13978 | 13990 | 13919 | 13701 | 13785 | 14047 | 13952 | 14046 | |
24 | 12141 | 12800 | 14316 | 13106 | 12621 | 15468 | 12638 | 12970 | 14222 | 11965 | 13229 | 15114 | 11974 | 13533 | 14333 | 12950 | 12870 | 14511 | 12532 | 13224 | 13880 | 12920 | 13073 | 14776 |
Everything below this point is meta-information about the search itself.
For instance, this chart shows the progress we've made in searching the valid range of each base. Note that the above statistics (even per-base statistics) can show results from incomplete bases, so check here to see how much of a base has been searched to see if a sample might or might not be representative.
These charts take the total searched range and divide it by the total processing time (from when a range was checked out to when it was returned) to get a per-search processing speed. We can also isolate this by user, or just sum the ranges they've completed to find their total contribution.