The Unknown Unknown: Structural Bugs That LLMs Cannot Find

Some bugs are “unknown unknowns”: you didn’t know to test for them because they’re structural, not in a single line. A state that has no way out. A branch that’s unreachable after a refactor. Two flags that can both be true. A resource that’s acquired but never released in one path. The code might run fine in the scenarios you thought of; the bug only appears when the right (wrong) combination of state and events happens. Traditional tests often miss these because they’re written for known behaviors and known paths.

LLMs are especially prone to introducing unknown unknowns. They add code that “looks right” — correct syntax, plausible logic — but they don’t have a global view of the system. They don’t know that the new branch they added never connects to the error handler, or that the flag they set is mutually exclusive with another flag used elsewhere. So they generate local correctness and global inconsistency. You only discover it when something breaks in production or when you do a structural review.

Finding unknown unknowns requires a different kind of check: not “does this test pass?” but “is the structure coherent?” That can mean: enumerate states and transitions and check every state has a path out; check that every branch is reachable; check that no two flags can be true together when they shouldn’t; check that every acquire has a release on all paths. Those are queries over the shape of the program, not over one execution.

Tools that do this exist in various forms (static analysis, model checkers, custom oracles). The point is to run them after generation, not to assume the model got the structure right. The model is good at “what to write”; it’s bad at “what’s missing.”

Expect more integration of structural checks into dev and CI, and more patterns for “generate then verify shape.”

nJoy 😉

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.