[Edit 21Sep23: My articles are all finished now, and I’ve finalized this post.]
“Code golf” is a type of challenge to write the shortest program meeting some specification. During the height of Wordle popularity, I heard about Golf Horse: Wordle, a challenge to compress 12947 5-letter words in as little JavaScript as possible.
I decided to take on this challenge as a way to learn more JavaScript and explore compression techniques. Because this challenge involved so much math, algorithms, and programming tricks, the word “epic” describes it best. I had quite a bit of fun, learned a lot, and I’d encourage others to give it a try if it sounds interesting. After all, I’m giving away all the secrets you need to beat my entry!
Results
At the time I submitted, my entry was in the lead by a substantial margin:
(The name “zymic” comes from the last entry in the word list.)
[Edit 31Aug23: There’s a new second place entry, coming in at 12155 bytes, and they made an excellent YouTube video explaining it! Their approach is quite different from mine, so I think it makes a great complement to the articles I have here.]
As the raw data is \(5 \times 12947 = 64735\) bytes (not counting the newlines which were in the source file), I achieved a compression ratio of \(64735 / 10982 = 5.89\).
What accounted for my lead? Context modeling and context mixing were the biggest source of improvements while I worked on this. They accounted for over 1000 bytes of savings compared to my early attempts using “standard” techniques (e.g., order-N contexts with constant weights). I also saved about 150 bytes with my base-138 decoder compared to the base-128 decoder I saw used in other Golf Horse entries. See my articles below about how I optimized each aspect of the program.
Articles
- How much data can UTF-8 hold?
The entries to the competition must be in UTF-8. What’s the theoretical limit for packing data into UTF-8? - Packing bits into JavaScript strings
We try to pack as many bits into UTF-8 JavaScript as we can, given syntax constraints and code size tradeoffs. - Modeling 5-letter English words
We build a model to predict what sequences of letters might be valid words. The better we can predict, the better we can compress. - Algorithmic differentiation and gradient descent in JavaScript
Our context mixer has a bunch of free parameters. We build a basic gradient descent algorithm in JavaScript to optimize them. - Compressing JavaScript with JavaScript
We save a few bytes by compressing our own JavaScript code! - Wrap-up and retrospective
A sanitized version of my submission (parameters and compressed data replaced), a retrospective on things I tried that didn’t work, and thoughts about possible future improvements.
Acknowledgements
- Beeminder. It’s hard to finish high-effort personal projects. Beeminder provides the nagging you need to get stuff done.
- Hasegawa Sayuri’s Golf Horse notes. There’s a lot of brilliant information packed into these notes by one of the Golf Horse leaders.
- Data Compression Explained by Matt Mahoney. This was my go-to reference for data compression algorithms.
- Tips for golfing in JavaScript – Code Golf Stack Exchange. All the golfing tricks for JavaScript.