Stop Defaulting to Autopilot. Most AI Features Should Be Copilots.
2026-01-31 · 14 min read · Janaina Maia
I was in a product review last week where the team proudly showed me their "fully autonomous" AI feature. It ingested data, classified it, generated reports, and emailed stakeholders. Zero human involvement.
"What's the accuracy?" I asked.
"About 78%."
Seventy-eight percent. For an engineering tool. Fully autonomous. No human review. I had to physically stop myself from closing my laptop.
The industry loves binary thinking about AI: either it's a tool (human drives) or it's autonomous (AI drives). But the most successful products I've worked on exist on a spectrum. And they're intentionally designed to operate at different points for different features, users, and contexts.
The Five Levels
Level 0: Manual with AI Data
The human does all the work, but AI enriches the data they work with. Users might not even recognize this as "AI." They just think their tools got smarter.
Risk level: Very low. The AI informs but doesn't decide.
Level 1: Copilot
AI makes suggestions, human approves or rejects every one. "Accept / Modify / Reject" controls. The user is in the loop for every decision.
This is where most enterprise products should start. I'll die on this hill.
Level 2: Coach
AI takes routine actions autonomously, reports what it did. Human reviews after the fact. The AI handles the boring stuff; you focus on the interesting problems.
Level 3: Supervisor
AI operates autonomously within defined boundaries. Only escalates when it hits something outside its scope or confidence drops below a threshold.
Level 4: Autopilot
Fully autonomous. Human involvement is optional and strategic. Requires exceptional reliability, monitoring, and recovery mechanisms.
Here's where it gets interesting: the right level isn't the highest one you can achieve. It's the highest one your users can trust, your domain can tolerate, and your model can sustain.
How to Choose
Five factors:
- Consequence of error: Low stakes → higher autonomy. Safety-critical → stay lower.
- Model reliability: 95%+ accuracy → Level 2-3. Below 90% → Level 0-1.
- Reversibility: Easy to undo → higher autonomy OK. Irreversible → more checkpoints.
- User expertise: Experts who spot errors quickly → higher autonomy. Non-experts → more guardrails.
- Regulatory environment: Regulated domains often cap at Level 1-2 by compliance requirement.
The Mixed-Autonomy Product
The most mature AI products don't operate at a single level. They run different features at different levels simultaneously:
- Data formatting: Level 3 — AI handles it, escalates anomalies
- Standard classification: Level 2 — AI does it, human batch-reviews
- Complex interpretation: Level 1 — AI suggests, human decides
- Safety-critical decisions: Level 0 — Human decides with AI-enriched data
This is actually how human teams work. Junior people handle routine tasks. Complex decisions involve seniors. Critical decisions go to the most experienced person. Design your AI the same way.
Transitions Between Levels
Moving up should be data-driven (accuracy metrics support it, correction rates are declining), user-controlled (different users at different levels), and reversible (moving back down should be even easier than moving up).
If users feel trapped at a high autonomy level, they'll stop using the feature entirely. I've seen this happen. The team celebrated reaching "Level 4" and then watched engagement cliff off a cliff because engineers didn't trust it and couldn't easily dial it back.
The goal isn't full autopilot. It's the right level of autonomy for each task, each user, and each context. That's what mature AI product design looks like. And yes, it's harder than just cranking everything to max autonomy. But it's the difference between an AI product people tolerate and one they rely on.