Impact-Site-Verification: f601b76f-8b13-493f-b88a-e401694e2e56

Agent Specialisation: Generalists Demo Well, Specialists Work

2026-03-08 · 5 min read · Janaina Maia

The first version of the Urbix AI was a single agent that knew everything.

Or tried to. It had planning scheme knowledge, engineering standards, local council policies, and general urban development guidance all mixed together in one knowledge base, served by one AI with one system prompt.

It demoed beautifully. One question, one agent, one answer. Clean.

In production, it was a mess.

What Actually Happened

When planners used the single-agent version, they would ask about setback requirements for a residential development. The AI would answer with a blend of state planning scheme rules, council-specific provisions, and general principles that applied across multiple jurisdictions. All technically correct information. Completely useless for the actual project, which was in one specific council with specific rules that overrode the general ones.

The AI knew too much and couldn't distinguish between what was relevant and what was noise.

Generalist AI systems have this problem. They optimize for breadth. The training, the prompts, the knowledge base all pull toward covering more ground. That is great for a general-purpose assistant. It is not great for a professional tool where the user needs one specific answer, not an overview of the landscape.

Split by Expertise, Not Feature

The insight that changed Urbix was a simple one. We were thinking about specialization wrong.

We had been considering splitting agents by feature: one agent for search, one for summary, one for comparison. That is a technical split. It doesn't solve the knowledge problem.

The right split is by expertise. What would a human expert in this specific area know, and what would they deliberately not know?

A good town planner specializing in Queensland residential development knows Queensland's planning scheme. They know the specific council codes. They know how these interact with state legislation. They do not attempt to simultaneously advise on New South Wales planning law, infrastructure engineering standards, and heritage overlays in Victoria. That would not make them more useful. It would make them less reliable.

That is how we rebuilt Urbix. Each specialist agent has its own defined expertise, its own curated knowledge base, its own system prompt that defines scope explicitly.

What Each Specialist Gets

Every Urbix specialist agent gets three things.

Its own knowledge base. Not a filtered view of one giant database. A separate, curated collection of documents that belong to this specific expertise. A Queensland planning agent does not have access to Victorian planning documents. Full stop.

Its own boundary instructions. The system prompt explicitly defines what this agent is for and what it will not address. When a user asks a Queensland planning agent about engineering standards, it says that falls outside its expertise and directs them to the relevant specialist. It does not attempt to answer.

Its own test suite. Each specialist has a set of known-answer questions used to validate performance. The questions are drawn from real professional scenarios within that domain. A specialist that performs well on general questions but fails on domain-specific edge cases is not ready for production.

The Routing Problem

Once you have specialist agents, you need a way to get users to the right one. This is the routing problem and it is harder than it sounds.

A user asking about setback requirements might need the planning specialist, the engineering standards specialist, or both, depending on context. A user asking about stormwater management might think they need planning when they actually need civil engineering standards.

We handle this with a routing layer that reads the user's question, identifies the relevant domain or domains, and either routes to the right specialist or surfaces both when the question spans expertise areas. The routing layer knows the boundaries of each agent's expertise and matches questions to agents based on that knowledge. It doesn't answer questions. It directs traffic.

What Happened When We Made the Switch

Accuracy improved significantly when we split into specialists. Not because we changed the underlying model or rewrote the prompts substantially. Because each agent now operated on a focused, relevant knowledge base with clear scope.

The agents also got better at knowing when to say no. A generalist AI tends to attempt every question because it has at least some relevant training. A specialist with clear boundaries declines out-of-scope questions more reliably, which is exactly what you want. A wrong answer from a confident specialist is more dangerous than a referral to another specialist.

Users responded differently too. Planners who had found the generalist version frustrating started trusting the specialists. Partly because the answers were more accurate. Partly because the refusals felt professional rather than like failures. A specialist who says this is outside my area carries credibility. A generalist that answers confidently and is wrong does not.

When a Generalist Still Makes Sense

Not every AI product needs specialists. If your domain is narrow and well-defined, a single agent with a focused knowledge base might serve you well.

But if your product spans multiple professional disciplines, covers multiple jurisdictions, or needs to serve different user types with different expertise levels, specialization is probably the right architecture.

The test I use: can I write a one-sentence description of exactly what this agent knows and does not know? If I can't, the scope is probably too broad and the agent will underperform in production.

Generalists impress in demos. Specialists deliver in daily professional use. Build for the people who use the product every day, not for the people who see the demo once.