A badly built RAG is the fastest way to leak data that should never get out. The classic pattern: you index "all the documentation", a user asks something innocent and the system hands back a chunk of a document that person should never have access to. It isn't an exotic bug — it's the default architecture of almost every demo RAG.
Permissions before embeddings
Access control isn't a filter you bolt on at the end: it lives in the retrieval layer. Before the model sees anything, the system has already restricted which chunks this specific user can retrieve. Identity rules; the embedding comes after.
Partition the index by permission
We don't dump everything into a single index and pray. We segment by access level and attach permission metadata to every chunk, so a query can only touch what belongs to whoever's asking.
- Identity filtering at retrieval, not at the answer.
- Indexes or partitions segmented by access level.
- Permission metadata on every indexed chunk.
- Query audit: who asked what and what got returned.
Citation or it's useless
Every answer links its source. No citation, no trust and no traceability: the user can't verify, and you can't audit. In enterprise, an answer without a source is an answer you can't defend.
What we measure
- Retrieval precision (does it bring back what matters?).
- Access leaks: the target is zero, and it gets audited.
- Citation coverage: % of answers with a verifiable source.
- Pipeline uptime and latency.
This is what a Senior AI Infrastructure Implementer does: not "hook up a RAG", but ship a knowledge system that scales, cites and respects permissions — because in enterprise a leaked data point isn't a technical incident, it's a crisis.