Not Every AI Agent Should Be a Coding Agent

One thing I’ve learned from building AI agents is this: not every AI agent should look like a coding agent.

Yes, coding agents are probably the most versatile kind of agent we have today. Give them a shell, filesystem access, web search, and enough autonomy, and they can do a surprising amount of work.

But versatility is not the same as effectiveness.

In practice, the best AI agents are not the most general ones. They are the ones designed around a specific objective — with the right tools, system prompt, skills, memory model, and operating environment.

That distinction becomes obvious when you build more than one kind of agent.

Two Agents, Two Different Worlds

Over the past year, I’ve been building two AI agents that couldn’t be more different.

Agent 1: Seeknal — A Data-Native Agent

Seeknal is an all-in-one CLI for data and AI/ML engineering. Its AI agent (seeknal ask) needs to answer questions about your data, build and validate pipelines, profile datasets, and generate reports.

For this agent, the “native language” is data operations. So we designed it so that:

sql_query is a native tool — the agent queries PostgreSQL, DuckDB, and Iceberg directly
Seeknal’s own operations (organize, expose, action) are native tools, not invoked through bash
The agent has full context of your pipeline schemas, lineage, and entity definitions before you ask anything
It understands the draft → dry-run → apply workflow natively

The result: when you ask “what’s our revenue trend by region this quarter?”, the agent doesn’t spawn a Python script, write SQL to a file, execute it via bash, then parse the output. It queries the database directly through its native tool and returns the answer in seconds.

No bash. No grep. No pip install. Just the right tool for the job.

Agent 2: A Personal Agent — Simple by Design

In a different setting, I also built a simpler personal agent for daily tasks. Its job: search for information, inspect files, summarize context, and help with general workflows.

For this agent, native tools like filesystem and web_search are enough. It doesn’t need SQL-native tools, pipeline operations, or ML model management. And it works better because it’s not distracted by capabilities it doesn’t need.

The Mistake I See Too Often

A lot of people start with the tools they already have — shell, browser, filesystem, search — then ask, “What can my agent do with this?”

I think that is backwards.

The better question is: what problem am I trying to solve, and what is the best environment for an agent to solve it in?

Because once you answer that, the rest becomes clearer: what tools should be native, what abstractions the agent should operate on, what skills should exist, and what kind of verification and recovery the harness should implement.

What Harness Engineering Taught Me

From our harness engineering research, we’ve been studying how to make AI agents reliable. One pattern keeps showing up: more tools doesn’t mean better performance. In fact, it often means worse.

A recent study (GraSP, arXiv:2604.17870) found that giving agents flat lists of skills actually hurts reliability. When you compile those same skills into a structured graph with typed dependencies, performance jumps — up to +19 reward and 41% fewer wasted steps.

Why? Because every tool you give an agent is a decision it has to make. More tools = more decisions = more chances to pick the wrong one.

This is why harness design matters so much:

The environment shapes the task
The task shapes the tools
The tools shape the agent’s behavior
And the harness determines whether that behavior stays useful or drifts into noise

A coding agent needs loop detection, patch verification, and file-level reasoning. A data-native agent needs schema grounding, query validation, and lineage awareness. A personal assistant needs lightweight context management and low-friction interaction patterns.

Same foundation. Different environment. Different harness.

The Design Framework

When I think about building an AI agent now, I start with five questions:

What is the real objective? Not “use AI” — but what job should this agent do repeatedly and well?
What environment does that work actually live in? Shell? Database? Browser? Internal tools? Messaging layer?
What should be native tools vs. wrapped tools? Native tools shape agent fluency. If SQL is 80% of what the agent does, make it native — not something it scripts through bash.
What failure modes matter most? Wrong SQL is different from wrong code. Wrong personal summary is different from wrong production patch.
What harness makes this agent reliable in that environment? Detection, routing, guardrails, and recovery should be designed around the real task.

When to Use What

I’m not saying coding agents are bad. They’re essential for general-purpose tasks. But they shouldn’t be your default.

Use a coding agent when:

The task is unpredictable and requires exploration
You need to prototype something quickly
The environment is unknown or constantly changing

Use a domain-specific agent when:

The task is well-defined and repetitive
You have clear success criteria
The environment is structured and knowable
Reliability matters more than flexibility

The Uncomfortable Tradeoff

Building a domain-specific agent is more work upfront than building a general-purpose one. It’s easier to give an agent bash access and say “figure it out.”

But that agent will be slower, less reliable, and more expensive to run. Seeknal’s data agent can answer an analytics question in one tool call. A coding agent would need 5-10 tool calls, a temporary Python script, error handling, and output parsing to do the same thing.

The best agent is not the most general one. It is the one designed for the work you actually need done.

Start from the problem, not from the toolchain. Because sometimes bash, grep, and ls are enough. And sometimes they are exactly the wrong abstraction.