What an AI agent actually is (and what's a "chatbot wearing a hat")
An AI agent, in the strict technical sense, is a system that combines a language model with three additional capabilities: the ability to decide which tool to use based on context, persistent memory across interactions, and mechanisms to measure its own work (evals). Any system missing all three is not an agent β it's a chatbot. The distinction isn't semantic, it's operational: chatbots respond, agents execute.
The reason the market is flooded with "agents" that are chatbots wearing a hat is commercial: the word "agent" sells better. But the difference shows up immediately in production β a well-built chatbot is useful; a badly built agent is a bomb. Knowing the difference before you pay is the best investment you can make right now.
The three types of agents that matter in the enterprise
Not all agents are the same β three categories cover 95% of useful enterprise cases. Confusing them gets you hiring the wrong thing.
Commercial agent (AI SDR, lead routing)
Built for commercial tasks: automated prospecting, lead qualification, opportunity routing. The most measurable (ROI in meetings generated) and the most adopted by companies with long sales cycles. Typical stack: domains + warm-up + scoring + sequences + CRM integration.
Operational agent (workforce, internal assistant)
Built for repetitive internal tasks: triaging incoming, generating drafts, document processing. Success is measured in hours saved per month. Requires deep integration with internal systems (ERP, ITSM, etc.).
Customer-facing agent (support, light commercial)
Built for direct interaction with the customer: support, FAQs, booking meetings. The most visible β and therefore the one with the highest reputational risk. Requires rigorous supervision, a solid human escalation path and CSAT metrics from day one.
The minimum architecture: LLM, memory, tools, evals
A serious agent has four layers. If your vendor can't sketch them on a napkin, they aren't building an agent β they're selling you a label.
- Base LLM β GPT-4 / Claude / Llama, depending on privacy, latency and cost.
- Memory β vector store (Pinecone, Weaviate, pgvector) to keep context across interactions + short-term memory inside the current conversation.
- Tools β APIs, functions, web search, connections to your systems. The agent decides which one to use.
- Evals β a set of representative inputs with expected outputs that runs automatically after every change.
How an agent gets "trained" (no, it isn't uploading a PDF)
The word "training" has gotten muddy. In 2026, "training an agent" rarely means fine-tuning the model β it means building a solid RAG system, writing a serious system prompt and configuring the tools correctly.
- Real prompt engineering. The system prompt defines who the agent is, what it can do, what it can't, what tone it uses and when it escalates. It's the most underrated piece, with the best impact-to-cost ratio.
- RAG over your knowledge base. Structure your KB into short blocks, generate embeddings, configure retrieval with re-ranking.
- Configure the tools. For each tool, define when to use it, which parameters to accept and how to handle errors.
- Fine-tuning (only if needed). Almost never is. Only when you need a very specific tone or style the prompt can't hit, or when cost/latency constraints justify a smaller specialized model.
Human-in-the-loop: why it isn't optional
Human-in-the-loop (HITL) means a human steps in at specific checkpoints in the agent's cycle. It is NOT "a human supervising everything all the time" β it's a human at the points where the cost of an error beats the cost of the pause. Without HITL at those points, the agent is an experiment; with HITL, it's a system.
- High-impact decisions (irreversible, binding actions with economic or legal consequences).
- Low-confidence cases from the model (when the answer's probability falls below threshold).
- Detected exceptions (input outside the expected patterns).
- Periodic sampled review of the "routine" work to catch systemic drift.
What it really costs to build and run an agent
| Cost | SMB | Mid-market | Enterprise |
|---|---|---|---|
| Technical setup | $3,000-15,000 | $15,000-75,000 | $75,000-250,000+ |
| Monthly operation (LLM + infra) | $50-500 | $500-2,500 | $2,500-15,000 |
| Human supervision (person-hours) | 5-15 h/mo | 20-50 h/mo | 50-200 h/mo |
| Monthly iteration | Included in $200-500/mo retainer | $1,500-5,000/mo | $5,000-25,000/mo |
When to ship an agent and when NOT to
| YES, build an agent | NO, skip it |
|---|---|
| High, repetitive volume | Low or erratic volume |
| Variable input, simple decision | Simple input, complex decision |
| Low or reversible cost of error | High, irreversible cost of error |
| Human available to supervise | No one to own the system |
| Process stable over 12+ months | Process that's shifting |
Free material Β· PDF
Governance checklist for AI agents in production
The 4 mandatory layers (HITL, logs, rollback, evals) explained step by step. Without these, an agent isn't a system β it's a bomb.
What you get
- Checklist of the 4 critical pieces
- Use-policy template by agent type
- Quality metrics by task (with thresholds)