Project Themis
These are my notes after finishing my most recent project, Brier.fyi. I won't be talking about the stuff on the site very much, just my mindset while making it.
Version 1.0
I had built simple calibration plots for prediction markets before, but I wanted to try out some new stuff and make a web app. In July 2023 I made a simple prototype in Python with Flask that ran everything in memory. It was terrible and slow (10+ second load time) but it was fun to make.
Version 2.0
In late 2023/early 2024, Manifund ran an experiment using impact certificates as a way to test the mechanism. There was real money on the line, so I submitted a few ideas and turned that cute little prototype into a full product that I thought was pretty cool, called Calibration City. The round was fun because there were a lot of people working on cool things, we got concrete feedback every month, and I got some money from a project that I would have done anyways.
Since I had to post updates by a deadline every month, I had a huge incentive to deploy cool new features as quickly as possible. I had never had deadlines like this so it was a little weird, but the feedback was very encouraging and I think it helped overall. It was certainly the most aggressive I had been about prioritizing new features and I think that was a good lesson to learn. By the end of the round I had made $3500 which was pretty great.
Overall I was pretty happy with how that project ended. I don’t normally work on projects for longer than 2-3 months, so this was a typical timeframe for me, but I got way more accomplished and made some money to boot. I didn’t do much with the project after that, some things behind the scenes started to break but the site still had a couple dozen regular users.
Hiatus
My work was really busy around this time so I didn’t have the time or energy to go back and try to fix things when they broke. And since I had shipped everything without much regard for maintenance I would have to spend a fair amount of time to do it.
In September 2024 I was encouraged to apply for a community grant round. I had some ideas on what I could do next, but I knew I wouldn’t have much time to do huge changes. I mainly focused on making it a way for people to donate for what I had done so far. I ended up with around $3800 in funding with a lot of people excited about what I had planned but no concrete obligations. I sketched out how it would look and what I needed to do, but I didn’t have time to pick it up again until I got a new job in January.
Version 3.0
My new job is fully remote and far lower stress. After doing basically nothing for a few months, I picked up Themis again in February and tried to figure out how to actually do what I envisioned.
Aside from the cool new features I wanted to add, I had some big problems with the existing site I wanted to address:
- The biggest thing that broke behind the scenes was the extract-transform-load data pipeline. It took a long time to run and failed often, which lead to me not refreshing the database for months at a time.
- Over time the site became slower and slower (I’m still not sure exactly why), plus it doesn’t always load correctly on the first visit. Caching doesn’t work quite right and so there was always a lot of load on the server.
- The old site was often cited without explanation or context, leaving visitors confused unless the person who linked it also gave an explanation. The primary chart looked neat, but didn’t really mean anything unless you already know about calibration. The Introduction page was supposed to be a remedy for that, but basically nobody read it.
And so I followed the siren song of a full rewrite.
- The data pipeline was already written in rust, but it deserialized every response immediately. Since several of the APIs had already changed significantly, I rewrote it to save the raw responses and deserialize after.
- The existing site had a lightweight Vue app that called an API that did all of the calculations for the plots. Since the data doesn’t change very often, I changed it to be a static site generated by Astro with the charts rendered at build time.
- With Astro I could add context and explanations in a way that was more intuitive, and with some more data I could answer frequently asked questions right up front.
- And with all of those improvements I was finally able to implement the cool new feature I had talked about six months prior.
You might read this and go “wow, that moron should have spent time up front making sure his shit was maintainable” and well, yeah, looking back you would be right. But for hobby projects where you’re only working on it for a few months and then maybe never coming back, I think prioritizing a cool feature over refactoring code is often a good choice.
You can see that project live at Brier.fyi.
This time around I gave myself another deadline. Not because I thought it would help me get stuff done faster (I think it actually burnt me out a bit) but because I wanted to give a talk about it at a mini-conference I would be attending. I ended up with a ton of great feedback and I really enjoyed giving the talk, so I considered that a good decision. I have decided, however, that my next project will not have a deadline.
Commit History
Lessons
I learn best by doing something wrong and then having to live with the consequences. What did I learn here?
Grog Add Feature
Features are good! They are, presumably, why people are visiting/using your project. If I have the choice to ignore other components and focus solely on features in a hobby project, there are many cases where I can and do.
I implemented a “no deck chairs” rule - don’t rearrange deck chairs on the Titanic. If there’s any chance the code could be rewritten, don’t spend time prettifying or refactoring it. Throw in some comments, apologize to your future self, and move on. The only time I was allowed to refactor was if it was required to add a new feature.
Extract, Transform, Load
I mentioned this above, but if you’re working with APIs that can change then separate out the (long-running, safe) download step from the (fast, but error-prone) extract step.
Storage space is cheap and you only need to keep the most recent run. If you have to wait on rate limits every time you change the extract schema, development stalls and you get frustrated. When you already have the raw response data, you can use a quick python script to explore the data and investigate quickly.
Visualization
I have a few notes about data visualization.
First: when you are working with data, especially big data, visualize everything at every step. You will find issues or interesting things about your data and you will find them faster than through analytical methods. Just throw each stat into a quick histogram, look at individual outlier data points, see the overall shape of your data. If you are running transformations on the data, look at it before and after each step. Is it what you expected? Plot out the data for a random instance and see if it makes sense.
Second: I saw this HN comment on the ISBN visualization post:
“I feel like visualizations of large datasets which are viewer-directed (i.e. they want you to "explore" the data instead of trying to tell you something specific about it or communicate a narrative) are often "pretty" but never particularly enlightening. I feel like that holds true for these in particular.”
I think the comment is a bit cynical (as per usual for HN) but they hit a good point here, which is that most users will just look at a visualization for a few seconds and move on.
- “Deep” visualizations like that one require a lot of engagement — people who take the time to explore can learn a lot, but most people don’t. I’ve seen personally that most users don’t change settings or click around, so show them the most important thing first.
- If you can clearly communicate your thesis in the first 10 seconds, the average user will have more specific questions and be more interested in clicking around to learn more. Even if they leave immediately you will have communicated much more clearly before they left.
- As the person working with the data, you probably know more about it than a random visitor. Share your insights with them! Point out the most important details and explain why they’re important. Context is super valuable.
Third: my overall flow for designing a new visualization is actually backwards.
- First, figure out what single question you want to answer. Sketch out a chart that would answer that question unambiguously. Try several variations.
- Create a mockup chart with dummy data, and put the dummy data in whatever format works best for the chart. Use a mock API or a static JSON file.
- Then draw the rest of the owl. Build your data pipeline however you see fit, but make sure you conform to the dummy data schema. The less transformation you have to do at later steps, the happier you will be.
When I started on this rewrite, I built the final site first with plain JSON served though nginx to simulate the API. Basically every chart and table was designed before I had any data. There ended up being a few changes to that schema but it served as a good reference point for everything else. Then I went back and built the data layer, database schemas, and so on. Very highly recommend.
An Astro Aside
Astro has a feature called content collections, which is described as so:
Content collections are the best way to manage sets of content in any Astro project. Collections help to organize and query your documents, enable Intellisense and type checking in your editor, and provide automatic TypeScript type-safety for all of your content.
Astro encourages you to use this in many cases:
Define your data as a collection when: [...] You need to fetch (tens of) thousands of related pieces of data, and need a querying and caching method that handles at scale.
This sounded great for what I needed. I was downloading a small portion of the data I had saved, only around 5 gigs for now but soon to be around 10 gigs, and I wanted a way to keep that data in some sort of cache for quick reloads in preview mode. So I tried out the content collections.
First off, the Node environment that Astro runs in has a memory limit, which you will probably need to increase. I don’t know why it has one, but alas.
My assumption was that to cache this data, Astro would save each “collection” to some lightweight database like SQLite. It does not. It serializes all collections into a string — not strings for each collection, but a single string for all registered collections. “Okay, this should probably fine even if it’s not the most efficient,” you might think, if you didn’t know that Node has a maximum string length of ONE GIGABYTE.
So yes, Astro’s content collections are probably great, as long as you don’t have more than 1 GB of data. If you have more than that, check out this issue. I ended up writing my own very simple, very dumb caching layer which can actually handle data “at scale”.
AI
I used a significant amount of LLM help for this project. At first it was just asking ChatGPT for advice, then asking it to write out some examples. I thought of it as a much better Google, able to find information about specific tools and techniques much quicker than it would have taken me alone.
I don’t know Javascript very well, so a lot of the time I would write some code myself and then copy it over to an LLM to peer-review and look for potential improvements. It was also great as a rubber duck or a second pair of eyes to see the forest while I'm in the trees.
Eventually Zed (my editor of choice at the moment) came out with a new agent feature, so I took that for a spin. It was insane. I would copy over a section of my notes describing a planned feature and Claude would fully implement it better than I could. It would walk me through everything it did, write and run tests, and adapt if I had concerns. I use it all the time now. As of when I'm writing this (June 2025), I highly recommend the blog post My AI Skeptic Friends Are All Nuts by Thomas Ptacek as a thorough argument for "you should take this seriously".