Skip to content
Implementa.

Pillar guide Β· Building an AI agent

Building an AI agent: the guide you should have read before the first prototype

Most of what's sold as "AI agent" is a chatbot wearing a hat. A real agent does three things a chatbot doesn't: decides which tool to use, keeps memory across interactions and measures its own work. This guide is here so you know the difference before paying for the first prototype.

What an AI agent actually is (and what's a "chatbot wearing a hat")

An AI agent, in the strict technical sense, is a system that combines a language model with three additional capabilities: the ability to decide which tool to use based on context, persistent memory across interactions, and mechanisms to measure its own work (evals). Any system missing all three is not an agent β€” it's a chatbot. The distinction isn't semantic, it's operational: chatbots respond, agents execute.

The reason the market is flooded with "agents" that are chatbots wearing a hat is commercial: the word "agent" sells better. But the difference shows up immediately in production β€” a well-built chatbot is useful; a badly built agent is a bomb. Knowing the difference before you pay is the best investment you can make right now.

The three types of agents that matter in the enterprise

Not all agents are the same β€” three categories cover 95% of useful enterprise cases. Confusing them gets you hiring the wrong thing.

Commercial agent (AI SDR, lead routing)

Built for commercial tasks: automated prospecting, lead qualification, opportunity routing. The most measurable (ROI in meetings generated) and the most adopted by companies with long sales cycles. Typical stack: domains + warm-up + scoring + sequences + CRM integration.

Operational agent (workforce, internal assistant)

Built for repetitive internal tasks: triaging incoming, generating drafts, document processing. Success is measured in hours saved per month. Requires deep integration with internal systems (ERP, ITSM, etc.).

Customer-facing agent (support, light commercial)

Built for direct interaction with the customer: support, FAQs, booking meetings. The most visible β€” and therefore the one with the highest reputational risk. Requires rigorous supervision, a solid human escalation path and CSAT metrics from day one.

The minimum architecture: LLM, memory, tools, evals

A serious agent has four layers. If your vendor can't sketch them on a napkin, they aren't building an agent β€” they're selling you a label.

  1. Base LLM β€” GPT-4 / Claude / Llama, depending on privacy, latency and cost.
  2. Memory β€” vector store (Pinecone, Weaviate, pgvector) to keep context across interactions + short-term memory inside the current conversation.
  3. Tools β€” APIs, functions, web search, connections to your systems. The agent decides which one to use.
  4. Evals β€” a set of representative inputs with expected outputs that runs automatically after every change.

How an agent gets "trained" (no, it isn't uploading a PDF)

The word "training" has gotten muddy. In 2026, "training an agent" rarely means fine-tuning the model β€” it means building a solid RAG system, writing a serious system prompt and configuring the tools correctly.

  1. Real prompt engineering. The system prompt defines who the agent is, what it can do, what it can't, what tone it uses and when it escalates. It's the most underrated piece, with the best impact-to-cost ratio.
  2. RAG over your knowledge base. Structure your KB into short blocks, generate embeddings, configure retrieval with re-ranking.
  3. Configure the tools. For each tool, define when to use it, which parameters to accept and how to handle errors.
  4. Fine-tuning (only if needed). Almost never is. Only when you need a very specific tone or style the prompt can't hit, or when cost/latency constraints justify a smaller specialized model.

Human-in-the-loop: why it isn't optional

Human-in-the-loop (HITL) means a human steps in at specific checkpoints in the agent's cycle. It is NOT "a human supervising everything all the time" β€” it's a human at the points where the cost of an error beats the cost of the pause. Without HITL at those points, the agent is an experiment; with HITL, it's a system.

  • High-impact decisions (irreversible, binding actions with economic or legal consequences).
  • Low-confidence cases from the model (when the answer's probability falls below threshold).
  • Detected exceptions (input outside the expected patterns).
  • Periodic sampled review of the "routine" work to catch systemic drift.

What it really costs to build and run an agent

CostSMBMid-marketEnterprise
Technical setup$3,000-15,000$15,000-75,000$75,000-250,000+
Monthly operation (LLM + infra)$50-500$500-2,500$2,500-15,000
Human supervision (person-hours)5-15 h/mo20-50 h/mo50-200 h/mo
Monthly iterationIncluded in $200-500/mo retainer$1,500-5,000/mo$5,000-25,000/mo

When to ship an agent and when NOT to

YES, build an agentNO, skip it
High, repetitive volumeLow or erratic volume
Variable input, simple decisionSimple input, complex decision
Low or reversible cost of errorHigh, irreversible cost of error
Human available to superviseNo one to own the system
Process stable over 12+ monthsProcess that's shifting

Free material Β· PDF

Governance checklist for AI agents in production

The 4 mandatory layers (HITL, logs, rollback, evals) explained step by step. Without these, an agent isn't a system β€” it's a bomb.

What you get

  • Checklist of the 4 critical pieces
  • Use-policy template by agent type
  • Quality metrics by task (with thresholds)

Frequently asked questions

For a basic one, no. Platforms like n8n, Make AI, Voiceflow or OpenAI Assistants let you ship functional agents without code. For serious agents β€” the ones that integrate with your real systems, handle errors and get monitored β€” yes, you need someone technical, though not necessarily you. The line between "pretty demo agent" and "agent that works" is crossed with code.

A basic agent (an assistant over your KB, a simple support chatbot) can be live in a week. A serious agent β€” with CRM integrations, tools, human escalation and evals β€” between 6 and 12 weeks. Any vendor promising "enterprise agent in 3 days" is selling demo with a tie on.

For prototyping: OpenAI Assistants (fast, simple). For no-code/low-code production: n8n or Make AI. For complex orchestration: LangChain or LangGraph (you need a dev). The pick depends less on the platform and more on who's going to run the agent day-to-day. Start with the simplest thing that covers your case β€” you can always level up.

Four pieces: (1) clear usage policy β€” what it can and can't do; (2) auditable logs of every decision and action; (3) human-in-the-loop for critical decisions; (4) periodic behavior review (automated evals + human review). Without those four, the agent is an experiment, not a production system.

Depends on the system around it. Well-designed: the agent detects uncertainty, escalates to a human and the situation resolves with little friction. Badly designed: the agent executes the wrong action with full confidence and you find out when a customer complains. An agent's quality isn't measured when it succeeds β€” it's measured when it hesitates.

Read it, or want it shipped?

This guide covers the thinking part. Implementing it β€” and making it measurable β€” is what we charge for.

Building an AI agent: the guide you should have read before the first prototype Β· Implementa