This app builds English haikus in a simple, phone-ready interface.
A haiku in English is a simple poetic structure. Three lines, having five, seven and five syllables, respectively.
Site by Joe Cooper.
Cranes are painted in watercolor by Vera Skorina.
To enforce these rules, the computer needs to know the verbal structure of each word.
The CMU Pronouncing Dictionary (Carnegie Mellon University) offers pronunciations in their own English pronunciation DSL, for example: darling D AA1 R L IH0 NG
Digits indicate "stress", which we'll interpret as a property of vowels and thus a syllable signifier.
In English, some spellings have multiple pronunciations, e.g. "record". I decided most such shared spellings have matching syllable counts, so in my ingest I simply take the first. Is it always correct? Perhaps it is not.
Suggestions have a few requirements. First, they need to conform to the above rules. Second, they need literary probability. Third, my NVIDIA 3090 needs to host it.
I decided to try a single forward pass of a tuned LLM. Whereas the forward pass emits a menu of tokens, I can simply filter that menu according to our deterministic rules, and use the logarithmic probabilities in a sort.
I chose SmolLM2-1.7B.
Words expressed with multiple tokens cannot be expressed in a single forward pass. The newer SmolLM3 is multilingual, so I reasoned its more expansive pretraining and tokenizer will less match our English-only menu.
I tuned the model on the Haiku 333K synthetic dataset. Effectively, this makes the new model a distillation of Claude 3.5 Sonnet.
I wrote the app in Preact on Bun, with a SQLite database. I like JSX and Typescript, but lately I'm exploring server-side rendering. Inspired by HTMX, I like HTML over the wire where possible.
Suggestions are hosted in a separate microservice running on a modest NVIDIA 3090.