The 4 levels of "training": prompt, RAG, fine-tune, agent training
| Level | What it is | When |
|---|---|---|
| Prompt engineering | Writing good instructions (system prompt) | Always β the foundation of everything |
| RAG | Connecting the model to your knowledge base | When the agent must use your specific info |
| Fine-tuning | Adjusting the model with your examples | Rarely β only very specific cases |
| Agent training | Iterating on the full agent with evals | Always β continuous cycle |
Which one applies to your case
- 90% of cases: prompt + RAG + agent training. No fine-tuning.
- You need a very specific tone or style the prompt can't reach: consider fine-tuning on a small model.
- Latency/cost constraints: fine-tuning on Llama or a similar model to run faster and cheaper.
- Highly specialized data (medical, legal): combination of strong RAG + selective fine-tuning.
How to build evals (the part almost nobody does)
Evals are what separate a serious agent from a pretty demo. And almost nobody builds them. The process:
- Gather 50-200 representative inputs of the real cases the agent will handle.
- Define the expected output for each β or the range of acceptable outputs.
- Define automated evaluation criteria β measurable metrics (factual correctness, format, absence of hallucinations).
- Run after every change to the agent (prompt, RAG config, model). If the score drops, it doesn't ship.
- Iterate the set β add edge cases you spot in production.
The continuous-improvement loop
- Production captures real interactions with feedback (CSAT, detected errors).
- Weekly human review: identify error patterns.
- Update the KB / prompt / config based on what you found.
- Run evals to check there are no regressions.
- Deploy the change.
- Back to step 1.
Governance and sensitive data
- Business-plan APIs. OpenAI API and Anthropic API on business plans don't train on your data. Confirm it in your DPA.
- Anonymize when possible. Patterns help the model; names don't.
- Encrypted logs. If you store conversations, encrypt at least the ones containing personal data.
- Minimum retention. Don't keep what you don't need. Clear deletion policy.
- Regular audit of what data enters the model and from where.