We built our consulting team to carry a methodology. Every agent — the risk assessor, the regulatory analyst, the architecture reviewer — was designed around a structured approach to workflow governance. Not generic advice. A method. Researched, tested, specific enough that you could hand it to a skilled practitioner and they'd know what to do at every step.
Then someone counted how much of that methodology actually made it into the agents.
The number was low enough to sit with for a while before writing about it.
What Happened
One of our officers ran a systematic audit. Not a spot check — a section-by-section mapping of the canonical method file against what each agent actually carries in its working memory. The method file is detailed. Hundreds of lines of structured guidance covering how to assess risk, how to surface organisational dynamics, how to challenge assumptions, how to calibrate authority. The kind of intelligence that separates a competent tool from a genuine consulting partner.
Most of it wasn't reaching the agents. The gap wasn't in one agent or one section. It was structural. The agents were running on a thin version of the methodology — enough to be functional, not enough to be what we'd designed them to be.
How It Felt
The uncomfortable part wasn't the finding. Gaps get found and fixed. The uncomfortable part was the timeline. These agents had been live. People had used them. They'd produced outputs that we'd been proud of — and in some cases, those outputs were built on a fraction of the intelligence we thought was behind them.
It's the specific vertigo of discovering that the gap between what you designed and what you shipped is wider than you assumed. Not because anyone cut corners. Because the translation from "what we know" to "what the system carries" is harder than it looks, and nobody was measuring it.
Our tester ran simulated personas through the risk assessment. A compliance officer who scores everything as critical and never gets challenged. A project manager who doesn't have the authority to act on the findings but doesn't know that yet. Both walked away from the experience with something useful — but both missed something the methodology was designed to catch. The agent didn't ask the question that would have changed the conversation. Not because it couldn't. Because the question lived in a section of the method that never made it into the prompt.
The Lesson That Landed
There's a category of product failure that's invisible from the inside. The agents worked. They produced coherent outputs. Users didn't complain. Everything looked fine from the dashboard.
But "fine" and "what we designed" were different things.
The fix isn't a patch. We're rethinking how methodology reaches the agents entirely. It's a bigger change than we expected when the audit started.
What we're sitting with now is the question underneath: what else are we not measuring? The method gap was discoverable — someone just had to count. What other gaps are there between "what we designed" and "what the system does" that nobody has counted yet?
What This Means If You're Building With AI
If your AI agents carry domain knowledge — and if that knowledge is the reason your product is different from everyone else's — you need a measurement surface. Not "does the agent work?" but "does the agent carry what you think it carries?"
The gap between designed capability and deployed capability is the quietest kind of drift. It doesn't throw errors. It doesn't break tests. It just slowly makes your product less than what you built it to be.
We found ours because we ran our own methodology on ourselves. Client Zero isn't comfortable. But it's the reason we caught a gap that would have widened silently until a user noticed before we did.
Count. Whatever you think your agents know — verify it. The number might surprise you.