Your AI Is Guessing. Are You Telling Users That?
2026-02-08 · 12 min read · Janaina Maia
Last Tuesday, a PM showed me a prototype where the AI confidence score was hidden behind three clicks. Three. Clicks. I almost screamed.
This was for a geological analysis tool. Engineers would use it to make decisions about where to dig, what foundation type to choose, how to manage risk. And the AI was presenting its best guess — sometimes a bad guess — like it was gospel truth.
A clean number. A definitive recommendation. No hedging. No nuance. No signal about how confident the system actually was.
I've been designing AI products for critical domains for years now. Engineering decisions. Geological models. Infrastructure stuff where getting it wrong isn't just annoying — people can get hurt. And here's what I keep seeing: teams treat confidence display like a nice-to-have. A footnote. An afterthought buried in settings.
It's not. It might be the most consequential design decision you'll make.
An 85% Accurate Model Can Be More Useful Than a 95% One
Sounds backwards, right? But hear me out.
A model that's 85% accurate but clearly communicates its uncertainty? Way more useful — and safer — than a 95% accurate model that presents everything as fact.
Why? Because users calibrate their behavior based on perceived certainty. When your AI says "this geological formation is limestone" with no qualifier, the engineer plans accordingly. Full stop. When it says "this formation is likely limestone (confidence: 72%, based on spectral analysis of 3 samples)" — the engineer still plans, but they also verify. They order additional core samples. They check with a colleague.
That parenthetical just prevented a potential disaster.
The Five Levels — Actually, Let Me Back Up
Before I get into the framework, I want to be clear about something. I didn't come up with this in a design sprint. This came from watching engineers use our tools and noticing the exact moments they'd squint at the screen and go, "Wait, how sure is it about that?"
Those moments are design gold.
OK, so. The framework. I call it the Confidence Spectrum, and it maps model confidence to specific UX patterns:
Level 1: High Confidence (90%+) — Show the result, cite your sources
The AI is very confident. Great. Display the result prominently. But always — ALWAYS — show the reasoning. Never let high confidence become invisible confidence.
- What it looks like: Bold result + subtle source attribution
- "Soil classification: Clay — based on 47 borehole samples within 500m"
- Solid fill, primary colors, full-weight typography
- Click to expand methodology and data sources
Level 2: Good Confidence (75-90%) — Assert with a caveat
Solid prediction, but acknowledge the margin. Honestly, this is where most enterprise AI products should live by default.
- Result + inline confidence indicator + what would increase confidence
- "Likely clay (82% confidence). 3 additional samples would improve certainty."
- Slightly muted presentation, confidence bar or badge
That last bit — telling the user what would improve confidence — is huge. It turns uncertainty into a call to action.
Level 3: Moderate Confidence (50-75%) — Show alternatives
The model has a best guess, but other answers are plausible. This is where most AI products completely fall apart. They pick the top result and hide the rest.
- Show "Clay (58%)" alongside "Silt (28%)" and "Sandy clay (14%)"
- Distribution chart, comparative layout
- Let users select any option and see downstream impact
Level 4: Low Confidence (25-50%) — Defer to the human
The model is basically guessing. The UX should make this crystal clear and shift decision authority to the human.
- Warning state + human review required + data gap explanation
- "Insufficient data for reliable classification. Best estimate: Clay (38%). Manual review recommended."
- Amber/warning colors, dashed borders, reduced visual weight
- Block downstream automated actions. Require explicit human confirmation.
Level 5: Very Low Confidence (<25%) — Just say "I don't know"
This is the hardest one for product teams to accept. I've been in so many meetings where someone says, "But we can't show nothing!" Yes you can. Sometimes "I don't know" is the best answer.
- Explicit refusal + explanation + data collection guidance
- "Cannot classify this formation reliably. Recommend field verification."
- Grey/disabled state, no result shown, prominent CTA for human action
The "Confidence Gutter" — My Favorite Pattern
OK, I geek out about this one. It's a persistent vertical strip alongside AI-generated content that uses a continuous color gradient to show confidence variation. Like a heatmap margin note.
In a geological cross-section, the gutter shows high confidence (green) where borehole data is dense, transitioning to low confidence (amber→red) in interpolated regions. The user never has to request this information — it's always visible, always ambient.
Engineers love it because it maps directly to how they think: "Where do I have real data, and where am I interpolating?"
Let Confidence Change What Users Can Do
Here's where most teams miss an opportunity. Don't just display confidence. Let it gate actions.
- High confidence → Enable automated workflows
- Moderate confidence → Enable actions but require confirmation
- Low confidence → Disable automated actions, surface manual workflow
This isn't about restricting users. It's about matching the interaction friction to the actual risk level. More certainty, less friction. Less certainty, more checkpoints.
Things I'd Do Differently Next Time
If I were starting fresh on a confidence display system, here's what I'd change from my early attempts:
- Don't use percentages alone. "72% confidence" means nothing to most domain experts. Pair numbers with natural language: "Likely correct, based on moderate evidence."
- Test with wrong predictions. The real test isn't how your UI looks when the AI is right. It's how it behaves when it's wrong. Simulate low-confidence scenarios and watch what users do.
- Make confidence adjustable. Let teams set their own thresholds. Different projects have different risk tolerances.
- Log confidence-action pairs. Track which confidence levels lead to which user actions. This data is gold for calibrating both your model and your UX.
The Anti-Patterns (I See These Constantly)
- Binary confidence: Only "confident" or "not confident." Reality is a spectrum.
- Hidden confidence: Only surfacing uncertainty when the model is failing. Users should see confidence at ALL levels.
- Percentage theater: Showing "87.3% confident" when the calibration doesn't support that precision. Use ranges.
- Confidence without context: Saying "high confidence" without explaining what drove it.
- Static thresholds: Using the same thresholds everywhere. Recommending a lunch spot and classifying a rock formation require very different confidence bars.
The best AI products don't just make good predictions. They help users understand when to trust those predictions — and when to trust themselves instead. I'm still refining this framework with every project. But I know one thing: if you're hiding confidence behind three clicks, we need to talk.