A source link doesn't mean the answer is true
Why cited documents still produce wrong answers — and the runtime checks that stop teams from trusting a link they cannot defend.
Why cited documents still produce wrong answers — and the runtime checks that stop teams from trusting a link they cannot defend.
Source links make AI answers feel trustworthy. They do not automatically make them trustworthy. A RAG system can cite a real document and still misrepresent what it says, cite a document it did not rely on, or attach a source after generating an answer from model memory.
That is why teams building document intelligence need to separate citation presence from citation faithfulness.
The first failure is fabricated citation IDs. The model invents a chunk reference that was never retrieved. This is easy to catch: force citations into a strict format and validate every cited ID against the retrieved evidence set before returning the answer.
The second failure is weak attribution. The cited chunk exists but does not support the claim. This needs faithfulness evaluation: compare answer claims against the retrieved context, not just against whether a citation exists.
The third failure is stale authority. The source once supported the claim but has been superseded by a newer policy, contract amendment, regulation, or procedure. This is why document versioning and index expiration matter.
At runtime, constrain the model to cite only retrieved evidence IDs. Refuse or retry if the answer cites unsupported IDs. Require every material claim to map to one or more chunks. Add an explicit “not enough evidence” path. Treat over-citation as a defect, because citing every document for every sentence makes citations meaningless.
Evaluate context precision, answer relevancy, citation validity, and faithfulness on production traces. Do batch scoring first if cost is a concern. The goal is not perfect academic scoring; it is a stable signal that tells you when retrieval, chunking, or model behavior has drifted.
Useful sources: Correctness is not Faithfulness in Retrieval Augmented Generation Attributions, Enterprise RAG with citation tracking and audit trails, and RAG evaluation with Ragas.