How can the compliance of large language models (LLMs) with Intelligence Community Directive (ICD) 203's analytic tradecraft standards of objectivity, independence of political consideration, and traceability to underlying sources be verified when LLMs are used for intelligence purposes? Can we ensure the trustworthiness and reliability of LLM-generated intelligence summaries?
An LLM using Retrieval-Augmented Generation (RAG) is exceptionally good at quickly providing overviews of lengthy documents or complex topics. Because of this powerful capability, LLMs may become an essential tool for all-source analysts. The ability to quickly summarize all IC knowledge on a specified topic is too powerful a tool to ignore. Unfortunately, LLMs are prone to hallucinations, and it is also difficult to understand how an LLM generates its results. Identical inputs can result in different outputs, and sources may be wholly fabricated by the LLM. A human analyst that behaved this way would not be trusted at best. Is there a way to use LLMs in a way that complies with analytic standards?