AI Insights Archives - Page 2 of 4

Two Engines, One Brain: Combining Probabilistic and Deductive AI

Posted on March 1, 2026March 3, 2026 by David Saliba

LLMs are probabilistic: they score and sample continuations. They’re great at “how do I do X?”, creative, fuzzy, pattern-matching. They’re bad at “is this true for all cases?” or “what’s missing?”, exhaustive, logical, deductive. Formal reasoning engines (theorem provers, logic engines, constraint solvers) are the opposite: they derive from rules and facts; they don’t guess. So one brain (the system) can combine two engines: the LLM for generation and the engine for verification or discovery of gaps.

The combination works when the LLM produces a candidate (code, a state machine, a set of facts) and the engine checks it. The engine might ask: is every state reachable? Is there a deadlock? Is there a state with no error transition? The engine doesn’t need to understand the domain; it reasons over the shape. So you get “LLM proposes, engine disposes”, the model does the creative part, the engine does the precise part. Neither can do the other’s job well.

In practice the engine might be Prolog, an SMT solver, a custom rule set, or a model checker. The key is that it’s deterministic and exhaustive over the structure you give it. The LLM’s job is to translate (e.g. code into facts or a spec) and to implement fixes when the engine finds a problem. The engine’s job is to find what’s missing or inconsistent. Two engines, one workflow.

We’re not yet at “one brain” in a single model. We’re at “two engines in one system.” The progress will come from better translation (LLM to formal form) and better feedback (engine to LLM) so that the loop is tight and the user gets correct, structurally sound output.

Expect more research and products that pair LLMs with deductive back ends for code, specs, and workflows.

nJoy 😉

What Agents Cannot Know: The Structural Gap in LLM-Assisted Development

Posted on February 25, 2026March 3, 2026 by David Saliba

Agents can read files, run tools, and reason over context. But they can’t know, in a formal sense, the structure of the system they’re editing. They don’t have a built-in notion of “every state has an exit” or “these two flags are exclusive.” They infer from text and code patterns. So there’s a structural gap: the agent can implement a feature but it can’t reliably verify that the result is consistent with the rest of the system. It doesn’t know what it doesn’t know.

That gap shows up when the agent adds a branch and misses the error path, or adds a flag that conflicts with another, or leaves a resource open in one path. The agent “thinks” it’s done because the code compiles and maybe one test passes. It doesn’t see the missing transition or the unreachable code. So the agent cannot know the full set of structural truths about the codebase. It can only approximate from what it read.

What would close the gap? Something that does have a formal view: a spec, a state machine, or a checker that reasons over structure. The agent proposes a change; the checker says “this introduces a stuck state” or “this flag can conflict with X.” The agent (or the user) then fixes it. So the agent doesn’t have to “know” everything, it has to work with something that does. That’s the role of oracles, linters, and structural checks in an agentic workflow.

Until that’s standard, the human stays in the loop for anything structural. The agent can draft and even refactor, but the human (or an automated checker) verifies that the design is still coherent. The structural gap is the main reason we don’t fully trust agent output for critical systems.

Expect more integration of formal or structural tools with agents, so that “what agents cannot know” is supplied by another component that can.

nJoy 😉

The Slop Problem: When AI Code Is Technically Correct but Architecturally Wrong

Posted on February 22, 2026March 3, 2026 by David Saliba

The slop problem is when the model produces code that is technically correct, it compiles, it runs in your test, but is architecturally wrong. It might duplicate logic that already exists elsewhere. It might add a new path that bypasses the intended state machine. It might use a quick fix (a new flag, a special case) instead of fitting into the existing design. So the code “works” but the system gets messier, and the next change is harder. That’s slop: low-quality integration that passes a quick check but fails a design review.

Why it happens: the model doesn’t have a full picture of the codebase or the architecture. It sees the file you opened and maybe a few others. It doesn’t know “we already have a retry helper” or “all state changes go through this function.” So it does the local minimum: solve the immediate request in the narrowest way. The result is correct in the small and wrong in the large.

Mitigations: give the model more context (whole modules, architecture docs), or narrow its role (only suggest edits that fit a pattern you specify). Review for structure, not just behaviour: “does this fit how we do things?” Refactor slop when you see it; don’t let it pile up. Some teams use the model only for greenfield or isolated modules and keep core logic and architecture human-written.

The slop problem is a reminder that “it works” is not “it’s right.” Tests verify behaviour; they don’t verify design. So the fix is process: architectural review, clear patterns, and a willingness to reject or rewrite model output that doesn’t fit.

Expect more tooling that understands codebase structure and suggests edits that fit the existing architecture, and more patterns for “guardrails” that keep generated code in bounds.

nJoy 😉

From Autocomplete to Autonomy: Five Generations of AI Coding Tools

Posted on February 19, 2026March 3, 2026 by David Saliba

AI coding tools have evolved in waves. First was autocomplete: suggest the next token or line from context. Then came inline suggestions (Copilot-style): whole lines or blocks. Then chat-in-editor: ask a question and get a snippet. Then agents: the model can run tools, read files, and make multiple edits to reach a goal. Each wave added autonomy and scope; each wave also added the risk of wrong or brittle code. So we’ve gone from “finish my line” to “implement this feature” in a few years.

The five generations (you can draw the line slightly differently) are roughly: (1) autocomplete, (2) snippet suggestion, (3) chat + single-shot generation, (4) multi-turn chat with context, (5) agents with tools and persistence. We’re in the fifth now. The next might be agents that can plan across sessions, or that are grounded in formal specs, or that collaborate with structural checkers. The direction is always “more autonomous, more context-aware”, and the challenge is always “more correct, not just more code.”

From autocomplete to autonomy, the user’s job has shifted from writing every character to guiding and verifying. That’s a win for speed and a risk for quality. The teams that get the most out of AI coding are the ones that keep a clear bar for “done” (tests, review, structure) and use the model as a draft engine, not a replacement for design and verification.

The progress is real: we can now say “add a retry with backoff” and get a plausible implementation in seconds. The unfinished work is making that implementation structurally sound and maintainable. That’s where the next generation of tools will focus.

Expect more agentic and multi-step tools, and in parallel more verification and structural tooling to keep the output trustworthy.

nJoy 😉

Vibe Coding: Speed, Slop, and the 80% Problem

Posted on February 16, 2026March 3, 2026 by David Saliba

“Vibe coding” is the style of development where you iterate quickly with an AI assistant: you describe what you want, the model generates code, you run it and maybe fix a few things, and you ship. It’s fast and feels productive. The downside is “slop”: code that works in the narrow case you tried but is brittle, inconsistent, or wrong in structure. You get to 80% of the way in 20% of the time, but the last 20% (correctness, edge cases, structure) can take 80% of the effort, or never get done.

The 80% problem is that the model is optimised for “what looks right next” not “what is right overall.” So you get duplicate logic, missing error paths, and design drift. Tests help but only for what you think to test. The structural issues, wrong state machine, flag conflicts, dead code, often don’t show up until production or a deep review. Vibe coding is great for prototypes and for learning; it’s risky for production unless you add discipline: review, structural checks, and clear specs.

Speed is real. The model can draft a whole feature in minutes. The trap is treating the draft as done. The fix is to treat vibe coding as a first pass: then refactor, add tests, and check structure. Some teams use the model for implementation and keep specs and architecture human-owned. Others use the model only for boilerplate and keep business logic and control flow hand-written.

Progress in LLMs will make the 80% better, fewer obvious bugs, better adherence to patterns. But the gap between “looks right” and “is right” is fundamental. Design your process so that the last 20% is explicit: who reviews, what gets checked, and what’s the bar for “done.”

Expect more tooling that helps close the gap: structural checks, spec-driven generation, and better integration of tests and review into the vibe-coding loop.

nJoy 😉

Flag Conflicts, Stuck States, and Dead Branches: The AI Code Debt Catalog

Posted on February 13, 2026March 3, 2026 by David Saliba

Flag conflicts happen when two (or more) boolean flags are meant to be mutually exclusive but the code allows both to be true. For example “is_pending” and “is_completed” might both be true after a buggy transition, or “lock_held” and “released” might get out of sync. The program is in an inconsistent state that no single line of code “looks” wrong. Stuck states are states that have no valid transition out: you’re in “processing” but there’s no success, failure, or timeout path. Dead branches are code paths that are unreachable after some change, maybe an earlier condition always takes another branch. All of these are structural defects: they’re about the shape of the state space, not a typo.

AI-generated code tends to introduce these because the model adds code incrementally. It adds a new flag for a new feature and doesn’t check that it’s exclusive with an existing one. It adds a new state and forgets to add the transition out. It adds a branch that’s never taken because another branch is always taken first. Tests that only cover happy paths and a few errors won’t catch them. You need either exhaustive testing (often impractical) or a structural view (states, transitions, flags) that you check explicitly.

A simple catalogue helps when reviewing: (1) For every flag pair that should be exclusive, is there a guard or an invariant? (2) For every state, is there at least one transition out (including error and timeout)? (3) For every branch, is it reachable under some input? You can do this manually or with tooling. The goal is to make the “AI code debt”, these structural issues, visible and then fix them.

Prevention is better than cleanup: if you have a spec (e.g. a state machine or a list of invariants), generate or write code against it and then verify the implementation matches. The model is good at filling in code; it’s bad at maintaining global consistency. So the catalogue is both a review checklist and a design checklist.

Expect more linters and checkers that target flag conflicts, stuck states, and dead branches in generated code.

nJoy 😉

Formal Reasoning Meets LLMs: Why Logic Engines Still Matter

Posted on February 10, 2026March 3, 2026 by David Saliba

LLMs are probabilistic: they score continuations and sample. They don’t have a built-in notion of “therefore” or “for all”, they approximate logical consistency from training data. So they can contradict themselves, miss a case in a case analysis, or add a branch that breaks an invariant. Formal reasoning engines (theorem provers, logic engines, constraint solvers) are the opposite: they deduce from rules and facts, and they can exhaustively enumerate or check. They don’t “guess” the next step; they derive it. So there’s a natural division of labour: the LLM for “how do I implement this?” and the logic engine for “is this structure sound?” or “what’s missing?”

Combining them means the LLM produces a candidate (e.g. a state machine, a patch, or a set of facts), and the logic engine checks it: are all states reachable? Is there a deadlock? Is there a state with no error transition? The engine doesn’t need to understand the domain; it reasons over the shape. That’s why people experiment with LLM + Prolog, LLM + SMT solvers, or LLM + custom rule engines. The LLM does the creative, fuzzy part; the engine does the precise, exhaustive part.

The challenge is translation: getting from code or natural language to a form the engine can reason about. That might be manual (you write the spec) or semi-automated (the LLM proposes a formalization and the engine checks it). Once you have a formal model, the engine can find the unknown unknowns that the LLM cannot see.

We’re not yet at “LLM writes the spec and the engine verifies the code” in one shot. But we’re at “use the LLM to draft, use the engine to check the draft or the structure.” That’s already valuable and will get more so as tooling improves.

Expect more research and products that pair LLMs with formal or logic-based back ends for verification and structural analysis.

nJoy 😉

The Unknown Unknown: Structural Bugs That LLMs Cannot Find

Posted on February 7, 2026March 3, 2026 by David Saliba

Some bugs are “unknown unknowns”: you didn’t know to test for them because they’re structural, not in a single line. A state that has no way out. A branch that’s unreachable after a refactor. Two flags that can both be true. A resource that’s acquired but never released in one path. The code might run fine in the scenarios you thought of; the bug only appears when the right (wrong) combination of state and events happens. Traditional tests often miss these because they’re written for known behaviours and known paths.

LLMs are especially prone to introducing unknown unknowns. They add code that “looks right”, correct syntax, plausible logic, but they don’t have a global view of the system. They don’t know that the new branch they added never connects to the error handler, or that the flag they set is mutually exclusive with another flag used elsewhere. So they generate local correctness and global inconsistency. You only discover it when something breaks in production or when you do a structural review.

Finding unknown unknowns requires a different kind of check: not “does this test pass?” but “is the structure coherent?” That can mean: enumerate states and transitions and check every state has a path out; check that every branch is reachable; check that no two flags can be true together when they shouldn’t; check that every acquire has a release on all paths. Those are queries over the shape of the program, not over one execution.

Tools that do this exist in various forms (static analysis, model checkers, custom oracles). The point is to run them after generation, not to assume the model got the structure right. The model is good at “what to write”; it’s bad at “what’s missing.”

Expect more integration of structural checks into dev and CI, and more patterns for “generate then verify shape.”

nJoy 😉

Van der Aalst’s 43 Workflow Patterns and What They Mean for AI-Generated Code

Posted on February 4, 2026March 3, 2026 by David Saliba

Researchers in workflow and process mining (van der Aalst, Russell et al.) catalogued 43 control-flow patterns: ways that tasks can be sequenced, split, merged, cancelled, and looped. The basics are Sequence (A then B), Parallel Split (A then B and C in parallel), Synchronisation (wait for B and C then D), Exclusive Choice (A then B or C but not both), and Simple Merge (B or C then D). From there you get multi-choice, discriminator, deferred choice, multiple instances, cancellation (task, case, region), structured loops, and more. The full list is a reference for “what kind of flow am I building?”

For AI-generated code the relevance is this: the model often implements one or two of these patterns (e.g. a sequence or a simple branch) but misses the rest. It might add a parallel split and forget the synchronisation. It might add a retry loop but not the cancel path. It might create multiple instances without a join. So the generated code can look right locally but violate the pattern, and that’s when you get stuck states, lost work, or races. Knowing the 43 patterns gives you a checklist: after the model generates code, which pattern is it trying to implement? Is the full pattern there?

You don’t have to implement all 43. Many systems only need a few: sequence, choice, maybe a retry or a timeout. But having the vocabulary helps. When you prompt the model (“add a retry with exponential backoff”), you’re asking for a specific pattern. When you review, you can ask “did we get the full pattern or only part of it?”

Formal workflow languages (BPMN, etc.) encode these patterns explicitly. In code they’re implicit. The gap is where bugs hide. Making the pattern explicit (in a spec or a diagram) and then checking the code against it is one way to keep AI-generated code structurally sound.

Expect more tooling that maps code to these patterns and flags incomplete or inconsistent implementations.

nJoy 😉

State Machines as Software DNA: The Hidden Architecture of Every System

Posted on February 1, 2026March 3, 2026 by David Saliba

Most non-trivial software has an implicit state machine: entities (orders, jobs, sessions) move through stages, and only certain transitions are valid. A payment can be pending, then completed or failed; a job can be queued, running, or done. We don’t always draw the machine, it’s buried in if/else and flags, but it’s there. That hidden structure is the “DNA” of the system: it determines what can happen, what can’t, and what we might forget (e.g. the path from “running” to “cancelled”).

Making the state machine explicit (states, transitions, guards) pays off. You can see dead ends, missing transitions, and inconsistent flags. You can generate tests that cover every transition or every state. You can document and review the behaviour in one place. Many bugs in production come from the code drifting away from the intended machine: a new state was added in one place but not another, or a transition was forgotten in an error path.

State machines don’t have to be fancy. A table (state × event → next state) or a small DSL is enough. The point is to have a single source of truth for “what states exist and what transitions are allowed.” Code then implements that; tests and tools can check the implementation against the spec. When an LLM generates code, it’s implementing (or extending) an implicit machine, if the machine were explicit, you could check the model’s output against it.

In legacy code the machine is often undocumented. You can reverse-engineer it (manual or with tooling) and then maintain it. Going forward, designing the state machine first and then writing or generating code to match it is a way to keep structure stable even when the model is additive.

Expect more tooling that extracts or checks state machines from code, and more patterns for “spec the machine, then implement.”

nJoy 😉

← Newer posts Older posts →