There's a quiet mistake I see in AI-adjacent code everywhere: people hand cryptic, structured input to an LLM and hope it figures out the domain on its own. It often works. Right up until it doesn't — usually in front of a customer, always with confidence.

A lot of professional domains have published rulebooks. Aviation notices. Medical billing codes. Maritime manifests. Legal citation formats. Insurance policy forms. Each has:

If a rulebook exists, the right move is almost always: parse it with code, not with a model.

Why models lose to lookup tables

I spent a few weeks watching a capable model hallucinate decodings of an industry format it had clearly seen in training but not deeply internalized. The answers read plausibly. They cited fields that didn't exist. They transposed codes. They confidently described a condition the notice didn't describe. None of the failures were caught by casual inspection — you had to know the format to know the answer was wrong.

Eventually I just wrote the lookup tables. A couple hundred lines of code, a pytest suite, and a public API around three functions. Now the model receives pre-decoded structured data instead of raw format strings. Its job shifted from "figure out the domain" to "summarize and reason about already-decoded facts" — which is what it's actually good at.

A translator is not a calculator

The easy metaphor: you wouldn't ask a human translator to also be your currency converter. Use the calculator for the deterministic part. Use the translator for the judgment part.

With LLMs, the same split works. If the mapping from input to output is a lookup, a regex, or a table in a specification — use code. The code will cost microseconds, get it right every time, and free up both tokens and attention for the parts of the task that actually need intelligence.

The generalizable shape

Before you write another prompt that asks the model to "understand this format," ask a different question: has someone already written the understanding down? If the answer is yes, the design has a clear shape:

  1. Deterministic pre-processing. Parse the format, expand abbreviations, look up codes, normalize schemas. Output: structured data.
  2. Model-side synthesis. Feed the model the structured data. Ask it questions that require judgment, comparison, summary, or communication — things no lookup table can do.

Accuracy goes up. Token spend goes down. Hallucinations on the format itself go to zero. And the failure mode, when it eventually happens, is detectable: the parser either ran or it didn't.

When the rulebook doesn't exist

This advice inverts for fuzzy domains. Free-text clinical notes. Customer support transcripts. Code with informal conventions. Email that a human would scan for tone. In those cases the model's pattern matching genuinely beats any hand-coded decoder, because the "format" isn't stable enough to code against.

The judgment call is: does a specification exist that I could point to? If yes, pre-process. If no, let the model do the pattern matching — and then write evals, because you're now in territory where its answers can drift.

The uncomfortable part

Writing a parser for a dusty industry format is not glamorous work. Nobody's retweeting it. But it's usually the highest-leverage thing you can do in a domain-specific AI system — more than a better prompt, more than a bigger model, more than a fancier retrieval layer.

The LLM is a remarkable general-purpose reasoner. It shouldn't be your domain decoder too.