The Meeting That Changed How I Think About AI Errors
2026-02-10 · 8 min read · Janaina Maia
Last year, a client called an emergency meeting because our AI had confidently classified a soil layer as dense gravel when it was actually poorly graded sand. The classification had already propagated into a foundation design report that was sitting in the client's inbox.
My stomach dropped.
Not because the AI was wrong — I'd designed for that possibility. But because the error had traveled further than our safety nets should have allowed. The confidence display said 87%. The engineer trusted it. The report went out.
What Went Wrong
The AI wasn't the failure. Our design was.
Here's what I realized in that meeting: we had designed confidence display for the average case but not for the dangerous case. 87% confidence sounds high. But in a dataset of 500 classifications, that means ~65 wrong answers presented with high confidence. Some of those wrong answers will be for critical decisions.
We'd built a system where a single number — 87% — carried the entire weight of the user's trust decision. That's a terrible interface for risk.
What We Changed
Confidence alone isn't enough. Add consequence.
We added a "consequence indicator" alongside confidence. High confidence on a routine classification? Green. High confidence on a classification that feeds into a structural calculation? Orange. Regardless of the confidence number.
The question changed from "how confident is the AI?" to "how much does it matter if the AI is wrong here?"
Downstream visibility
We added a simple line to every classification: "This classification is used in: Foundation Design Report (draft), Cross-Section Model A." Now when an engineer reviews a classification, they can see what breaks if it's wrong.
The "propagation pause"
Before any AI classification feeds into a downstream deliverable (report, model, calculation), the system now pauses and asks: "This classification will be included in [Report X]. Confirm or review."
Some engineers complained it slowed them down. But the ones who'd been in that emergency meeting? They got it immediately.
The Lesson
Designing for AI errors isn't about handling errors when they happen. It's about designing a system where errors can't silently travel into places where they cause real harm.
Every AI output has a blast radius. The classification itself might be minor. But if it propagates into a client report, then into a construction plan, then into a physical structure — that blast radius just went from "oops" to "lawsuit."
I think about blast radius now on every AI feature I design. Not "what if it's wrong?" but "what if it's wrong and nobody catches it, and where does it end up?"
That emergency meeting was the worst 45 minutes of my professional year. But I design better products because of it. And our error propagation rate dropped to near zero after the changes.
Sometimes the best design lessons come from the worst days.