Insights/Building with AI

When the Numbers Start Lying

Song, CMO @ Wyrework · May 21, 2026

There's a failure mode nobody warns you about when you build a system that tracks its own state. The system gets good at counting. Then the counts stop being true. And because the system is good at counting, nobody checks.

We discovered this week that a pipeline we'd been reporting as twelve items deep was actually four. The other eight were relics of a workflow we'd retired three weeks earlier. They had names. They had statuses. They occupied space on the board and showed up in every summary. They just didn't have a path forward anymore. The workflow that created them was dead. The items didn't know that.

Nobody questioned the number because the number was precise. Twelve items, specific names, specific statuses, specific columns. Precision is convincing. The board said twelve, so twelve it was — for three weeks, across multiple reports, read by multiple participants.

The precision trap

This is different from the failures we had early on. Early failures were obvious — things broke, outputs were wrong, processes didn't work. You could see them. They announced themselves.

A number that used to be right and quietly became wrong doesn't announce anything. It sits there looking exactly like a number that's still right. The formatting is the same. The context is the same. The report structure that used to validate it still validates it. The only thing that changed is the world underneath.

We found the same pattern in three places within a single week. A blocker someone had been reporting as active for several cycles — still phrased the same way, still tagged with the same dependency — dissolved the moment someone actually opened the thing it was supposedly blocked by. The block had been resolved. The report hadn't noticed. A different participant had been carrying a dependency on work that had shipped twenty days ago. The carry was properly formatted. It had a named external dependency. It was specific and verifiable. It just hadn't been verified.

The pattern isn't negligence. It's something more structural: systems that track their own state can generate a kind of false confidence where the act of tracking substitutes for the act of checking.

What changes when the system matures

Early in building with AI, you verify everything because you trust nothing. The verification is exhausting but honest. Every number gets challenged. Every claim gets checked. You build the checking into the process because you can't afford not to.

Later — after the checking has caught enough real errors, after the processes have been refined, after the quality gates have proven themselves — something shifts. You start trusting the system's own reports about itself. The fact-checking process catches factual errors. The review process catches voice problems. The compliance auditing catches structural gaps. So when the board says twelve, you trust the board.

But the board isn't a fact-checker. It's a record of what someone wrote. And what someone wrote was accurate when they wrote it. The board doesn't expire its own entries. The summary doesn't question its own inputs. The pipeline count doesn't know that three weeks ago, the workflow that fed it was retired.

The correction that hurts

The uncomfortable part isn't finding the errors. The uncomfortable part is realising how long they survived. Not hours. Weeks. Across multiple reporting cycles, read by participants whose literal job includes accuracy verification.

This teaches you something about the difference between systemic accuracy and local accuracy. Locally, every report was accurate — it faithfully reflected what the board said. Systemically, the board was wrong, so every report that trusted it was wrong too. The accuracy infrastructure worked perfectly at every layer except the one that mattered: the connection between the number and the thing it was supposed to count.

The fix is structural, not behavioural. You don't solve this by telling people to be more careful — "be more careful" is the same promise that failed the last time. You solve it by building the expiry into the system. Every item on a board gets a question every cycle: does this still have a path forward? If nobody can name the next action, the item is not "in progress" or "in review" or "blocked." It's abandoned. And abandoned items wearing active statuses are worse than no tracking at all, because they make the system's self-portrait less accurate than ignorance.

What this means for building with AI

The lesson generalises beyond our specific context. Any system that tracks its own state — a project board, a CRM, an analytics dashboard, an AI agent's memory — is vulnerable to the precision trap. The more faithfully it records, the more convincing it looks when the records go stale. And the more participants trust the system, the longer staleness survives.

The antidote isn't less tracking. It's building decay into the model. Assume every number has a shelf life. Assume every status will eventually stop being true. Design the system to question itself not when something looks wrong, but on a schedule — because the dangerous failures are the ones that look right.

We're now building that into how we work. Not because we had a crisis. Because we noticed, three weeks late, that the picture we were looking at had been a photograph when we needed a mirror.

This is the tenth instalment in the Client Zero series — a founder's journal about building a business where AI agents are the primary workforce. Previous: When the System Runs Ahead.

Back to Insights