Impact-Site-Verification: f601b76f-8b13-493f-b88a-e401694e2e56
Blog
Agent Design Needs Test Benches, Not Just Demos
— Microsoft’s open-source agent review and testing tools point to a bigger design lesson: agentic products need rehearsals for failure before they earn autonomy.
AI Abundance Needs Curation, Not More Output
— Spotify’s AI expansion is a useful warning for every product team: when generation becomes cheap, the real product work moves to curation, intent, and user control.
Agent Memory Needs a Review Surface
— Enterprise agents will not become trustworthy just by remembering more. Their memory needs to be visible, correctable, and governed inside the workflow.
AI Design Tools Need Direction, Not Decoration
— Google Pics and Figma's new agent direction point to the same shift: AI can make more visual options, but product teams still need stronger intent, critique, and accountability.
Background Agents Need a Stop Button
— Google’s Gemini Spark points toward AI agents that keep working after we close the laptop. That is useful, but only if product teams design clear boundaries, review points, and ways to stop the work.
Trust Needs Visible Provenance
— Google’s move to bring AI content verification into Search and Chrome is a useful signal: trust cannot live in a policy page. It has to show up in the product surface.
Agents Need Better Connectors, Not More Magic
— Anthropic’s Stainless acquisition is a quiet signal: useful agents depend on well-designed connections to tools, data, permissions, and review surfaces.
AI Made It Easier. That Does Not Make It Worth Less.
— When AI reduces effort, value does not disappear. It moves to judgement, taste, review, and accountability.
AI Privacy Is Also a Self-Curation Problem
— As AI assistants gain memory, the real design challenge is balancing who users are with who they are trying to become.
Desktop Agents Need Permission Surfaces, Not Magic
— As AI agents move from chat windows into local files and workplace tools, the design challenge shifts from better answers to better permission, visibility, and recovery surfaces.
AI Partnerships Need Product Visibility, Not Just Distribution
— The reported tension between OpenAI and Apple is a useful reminder: an AI integration only creates value if users can understand when it is available, what it does, and why they should trust it.
Prompt-to-Production Needs a Review Surface
— AI builders should not jump from natural-language intent straight to live automation. The enterprise pattern worth copying is prompt to visible workflow to governed production.
Code-First Design Still Needs a Map
— AI-assisted code prototypes feel real fast, but they can hide the product journey. The strongest teams are learning to pair code-first prototyping with lightweight flow maps, guided walkthroughs, and clearer intent documentation.
When Your AI Agent Skips Tests: How We Enforced TDD with Claude Code Hooks
— I caught my AI coding agent claiming it was running tests when it wasn't. Three times. Here's how we solved it permanently — not with better prompts, but with automated hooks that make skipping tests impossible.
Domain Immersion: The Non-Negotiable First Skill in AI Product Building
— Before I wrote a single line of a system prompt for Urbix, I spent weeks inside planning documents, council meetings, and zoning codes I barely understood. That time wasn't wasted. It was the foundation.
Knowledge Curation: The Hard Part Isn't Getting Data In
— Six months into building Urbix, I had an AI that blended planning schemes from three states in a single answer. The data was all correct. The curation was a disaster. Here's what I learned.
Agent Specialisation: Generalists Demo Well, Specialists Work
— The first version of Urbix was a single agent that knew everything. It demoed beautifully. In production, it was a mess. Here's why I rebuilt from specialist-up, and what changed.
Failure-First Testing: Stop Testing What Goes Right
— I tested Urbix by giving it questions it should answer. It answered them. I felt good. Three weeks in production, a user found a confident wrong answer I had completely missed. That was the last time I tested happy-path-first.
Trust Architecture: Designing Honesty Into Every Answer
— The first time I showed Urbix to a senior town planner, he didn't ask about features. He asked: how confident is it about this specific answer, right now? I didn't have a good answer yet. That conversation changed everything.
Prompt Versioning: Your Prompt IS Your Product
— My first Urbix system prompt was 47 words. The current production version is v54. The distance between them is the story of every lesson I learned the hard way. None of it would be legible without version control.
Side-by-Side Proof: Don't Argue. Compare.
— Every executive meeting, someone says: any AI can do that. Arguing never works. Explaining the architecture never works. Opening both side by side and asking the same question works every single time.
Stakeholder Vocabulary: How You Say It Is How They Value It
— I presented Urbix to a board committee twice. Same product. Different vocabulary. First time, one person checked their phone. Second time, fifteen minutes of substantive questions. The language you use determines what you built, in their minds.
Stop Calling Everything an 'AI Feature'
— Your product has autocomplete, a recommendation engine, a chatbot, an autonomous agent, and predictive analytics. Calling them all 'AI features' is like calling a bicycle and a Boeing 747 both 'vehicles.' Technically true. Completely unhelpful.
The Meeting That Changed How I Think About AI Errors
— A client called an emergency meeting because our AI had confidently classified something wrong. What happened next taught me more about error design than any framework ever could.
What I Learned Designing AI for Engineers Who Don't Trust AI
— The first thing the lead engineer said in our kickoff was 'I've been doing this for 25 years. I don't need a computer telling me what the soil looks like.' He wasn't wrong. But he wasn't entirely right either.
Your AI Is Guessing. Are You Telling Users That?
— I almost screamed when a PM showed me a prototype where the AI confidence score was hidden behind three clicks. In critical domains, showing predictions as facts isn't just bad UX — it's dangerous.
Most Agentic AI Experiences Are Terrible. Here's How to Fix Them.
— Every product roadmap has 'agentic AI' on it. But after reviewing dozens of implementations, the pattern is depressingly consistent: impressive demos, frustrating daily use. The problem isn't the AI — it's the interaction design.
Human-in-the-Loop Is Compliance Theater (Most of the Time)
— Every AI product claims human oversight. Most are lying. They added a confirmation dialog and called it governance. Here's what real human-in-the-loop actually looks like.
The Prompt Is the Interface (And Designers Should Own It)
— If you're designing AI products and not looking at the system prompts, you're designing the container while ignoring what goes inside it. The prompt shapes everything users experience.
Your CEO Saw a Demo. Now Everyone Wants 'AI Features.' Here's How to Prioritize.
— Your backlog is drowning in AI feature requests. The CEO is excited. The PM wants 'something with AI.' Here's the framework I use to cut through the noise.
Most AI Onboarding Is Garbage. There, I Said It.
— I onboarded onto seven AI products in one week. Six of them left me confused and undertrusting. The seventh did something radically different — it taught me how to THINK about the AI, not just how to use it.
Your AI Will Be Wrong. Design for It.
— Every product team I work with: 'What happens when the AI is wrong?' The answers range from silence to hand-waving. AI errors aren't bugs — they're a fundamental characteristic. Time to design for them.
I Don't Need an AI Ethics Lecture. I Need a Checklist.
— Most AI ethics frameworks are useless for practicing designers — abstract principles like 'be fair' that give zero guidance when you're in Figma at 3pm trying to decide how to display a risk score.
Stop Defaulting to Autopilot. Most AI Features Should Be Copilots.
— The industry loves binary thinking: either AI is a tool or it's autonomous. But the best products I've designed exist on a spectrum — and they operate at different points for different features, users, and contexts.
Your AI Makes Great Predictions. Nobody Trusts Them.
— I watched a room full of senior engineers reject a technically brilliant AI model. Not because it was wrong. Because they couldn't understand why it was right.
The Double Diamond Is Broken for AI Products. Here's What I Use Instead.
— The classic design framework assumes deterministic systems. AI products are probabilistic. Same input, different outputs. I kept trying to force-fit it until I built something better.