!lemmysilver
Thank you for voting. You can vote again in 24 hours. leaderboard
The real meat of the story is in the referenced blog post: https://blog.codingconfessions.com/p/how-unix-spell-ran-in-64kb-ram
TL;DR
If you’re short on time, here’s the key engineering story:
-
McIlroy’s first innovation was a clever linguistics-based stemming algorithm that reduced the dictionary to just 25,000 words while improving accuracy.
-
For fast lookups, he initially used a Bloom filter—perhaps one of its first production uses. Interestingly, Dennis Ritchie provided the implementation. They tuned it to have such a low false positive rate that they could skip actual dictionary lookups.
-
When the dictionary grew to 30,000 words, the Bloom filter approach became impractical, leading to innovative hash compression techniques.
-
They computed that 27-bit hash codes would keep collision probability acceptably low, but needed compression.
-
McIlroy’s solution was to store differences between sorted hash codes, after discovering these differences followed a geometric distribution.
-
Using Golomb’s code, a compression scheme designed for geometric distributions, he achieved 13.60 bits per word—remarkably close to the theoretical minimum of 13.57 bits.
-
Finally, he partitioned the compressed data to speed up lookups, trading a small memory increase (final size ~14 bits per word) for significantly faster performance.
For anyone struggling, lemmy web interface added the colon into the URL for the blog post link. Here’s a clickable version without the colon:
https://blog.codingconfessions.com/p/how-unix-spell-ran-in-64kb-ram
Thanks, and sorry about that! I removed the colon from near my URL now, just in case.
Thank you
-
If it aint broke, don’t fix it.