Disclosing and Citing Generative Artificial Intelligence in Air Force Writing: A Proposed Framework Published Dec. 9, 2024 By Maj. Jeremy DeLaCerda Wild Blue Yonder--Maxwell AFB -- Rapid and responsible integration of artificial intelligence (AI) into Air Force operations is a national security imperative. As the US’s first AI related National Security Memorandum, published in October 2024, affirmed: “The United States must lead the world in the responsible application of AI to appropriate national security functions.”[1] While AI assimilation will be a long-term undertaking, military leaders should look for AI tools that are ready to employ now. Large language models (LLMs) that use generative AI, such as ChatGPT, are already in broad use and are ready for adoption. And insights from the fields of law and academia can shed light on how to incorporate them responsibly. An effective Air Force LLM framework will include disclosing generative AI use, certifying the accuracy of generated material, and citing each piece of generated information.[2] Generative AI Background Generative AI is marked by its ability to create (i.e., generate) new material.[3] These models are first trained on large amounts of data, such as text, images, or music, and they encode the relationships among the data through neural networks.[4] Using multiple layers of network enables mapping various facets of these relationships in a process known as “deep learning.”[5] Generative AI models then use these relationships to predict what information is statistically likely to come next based on user prompts.[6] Through this process of prediction, LLMs can convert simple prompts into longer, more complex text that provides ideas and words users may not have had before.[7] Accordingly, LLMs can provide powerful research and writing assistance and are gaining broad acceptance. A May 2024 Impact Research survey found that nearly half of teachers, K-12 students, and undergraduates used AI chatbots at least once a week, and it is easy to imagine these numbers increasing as AI is integrated into more areas of everyday life.[8] In June 2024, the Air Force introduced NIPRGPT, its own LLM, which “allows users to have human-like conversations to complete various tasks” and “can answer questions and assist with tasks such as correspondence, background papers and code, all within a secure computing environment.”[9] By September 2024, over 80,000 Air Force and Space Force personnel had used it, indicating a strong desire of Air Force members to harness the power of generative AI for research and writing.[10] Reliability and Traceability Concerns of Generative AI While LLMs can provide meaningful writing assistance, users should appreciate their limitations, including “hallucinations,” where models create text that appears factual but has no basis in reality.[11] Philosopher Shannon Vallor notes that this is not a sign that the LLMs are malfunctioning; rather, “these fabrications are exactly what ChatGPT is designed to do—produce outputs that are statistically plausible given the patterns of the input.”[12] The more data LLMs are trained with, the more human-like their words become, and the more difficult inaccurate outputs are to identify.[13] This creates a serious concern about the reliability AI-generated text. Because factually accurate statements are central to the practice of law, the legal profession has been at the forefront of grappling with hallucinations.[14] In April 2024, the New York State Bar Association’s Task Force on Artificial Intelligence identified several instances throughout the country where attorneys filed documents that cited non-existent cases that were created out of thin air by generative AI.[15] A January 2024 Stanford study of hallucinations in legal writing found that four popular LLMs hallucinated over half the time when asked legal questions, and a subsequent June 2024 Stanford study investigated two legal-specific AI platforms designed to avoid hallucinations, which still hallucinated in 17% to 33% of cases.[16] In most LLMs, unreliability is compounded by lack of traceability, where generated text is not accompanied by sources or an explanation of how the information was created.[17] Some of this “black box” problem is caused by lack of training data transparency from AI companies.[18] But the broader problem is that LLMs’ complex, multi-layered networks simply prevent tracing generated text back to its sources.[19] This opaqueness prevents traceability of ideas, which is vital to the integrity and trustworthiness of written products. In this regard, LLMs contrast starkly with how traditional sources are used in writing, where clear citations allow authors to support their claims transparently, and readers to check their work easily. Reliability and traceability are not only important aspects of authorship; they are also two of the five Department of Defense (DoD) AI ethical principles. [20] If left unchecked, generative AI use could undermine these principles and decrease the trustworthiness of Air Force writing. Proposed Framework The Air Force can empower responsible AI use in official writing with a framework requiring disclosing generative AI use, certifying the accuracy of generated information, and citing each individual piece of generated text. First, any document created with generative AI assistance should include a prominent disclosure stating generative AI was used and certifying the accuracy of all generated information. For example: Generative artificial intelligence was used in the creation of this document. Any text generated by artificial intelligence is accompanied by an explanatory citation. Unless otherwise noted, all text generated by artificial intelligence has been verified as accurate by a human.[21] This disclosure requirement draws from lessons in the legal field, where some courts have begun requiring parties to make similar disclosures. For instance, a federal judge in Texas requires “a certificate attesting either that no portion of any filing will be drafted by generative artificial intelligence…or that any language drafted by generative artificial intelligence will be checked for accuracy, using print reporters or traditional legal data bases, by a human being.”[22] And a federal Pennsylvania judge requires attorneys to, “in a clear and plain factual statement, disclose that AI has been used in any way in the preparation of the filing, and CERTIFY, that each and every citation to the law or the record in the paper, has been verified as accurate.”[23] AI disclosures are also gaining prominence in academic writing, with organizations such as the Committee on Publication Ethics and World Association of Medical Editors recommending them.[24] In medical writing, a 2023 study in the Journal of Nursing Scholarship found that 37.6% of nursing studies journals required AI disclosures in manuscripts submitted for publication, and 14.5% of general and internal medicine journals included AI statements.[25] In military education, the Naval Postgraduate School (NPS) uses a three-part disclosure statement that “identifies the generative AI tool you used, explains why you used it, and describes how you used it to support writing and manuscript development.”[26] The US Government has also indicated the prudence of disclosing generative AI use. The Office of Personnel Management advises federal employees to, “Follow your agency’s policy on when and how to disclose uses of GenAI (for example, labels, watermarks, or disclaimers on AI-generated material)” and “Review all AI-generated materials, including sources cited, to check that they are valid, accurate, complete, and safe.”[27] By requiring a disclosure statement, the Air Force can ensure its personnel meet these prudent policy goals. In the framework’s next step, the general disclosure is accompanied by specific citations each time information generated by AI is referenced in a document. Several writing manuals, including The Chicago Manual of Style and APA Style, contain citation formats for generative AI.[28] Because the Air Force already borrows heavily from The Chicago Manual of Style, it is reasonable for the Air Force to adopt its guidance for AI as well.[29] The manual recommends that use of generative AI either be explained in the text or a note and that it is done in sufficient detail that the reader understands what tool was used and what information was generated by it.[30] This citation format will provide robust transparency and allow readers to scrutinize each piece of generated information. AI is ubiquitous and appears in various forms, and this framework should not apply to every AI-enabled tool. For instance, tools like search engines and spelling and grammar checkers may use AI but have human judgment and decision-making built in. Therefore, they do not raise the “hallucination” and “black box” problems discussed above. The line for determining when disclosure is needed may not always be clear, but a workable policy line can be drawn and should be linked to these two concerns. NPS policy provides a good starting point for line drawing, stating disclosure is not required, “for uses of generative AI to support (but not produce) one’s final product, much as one might use search engines, library databases, grammar checkers, online dictionaries and thesauruses, or task planners.”[31] In other words, if the tool is just a step that leads to a separate verifiable source or prompts an independent human decision by the writer, then citation is not needed. While this framework will not, strictly speaking, eliminate the underlying problems built into LLMs, it will sufficiently mitigate their negative effects. It removes the concern of reliability by having the author verify the accuracy of the generated material, correct any hallucinatory data, and certify that all the information is accurate. Although the framework will not allow readers to see into AI’s “black box,” clearly identifying and verifying generated information introduces traceability back into the process in two ways: First, citations allow authors to explain the AI tool they used and how they used it; and second, citations can note additional external sources authors found that corroborate the information produced. This will allow readers to check authors’ work as they would with traditional citations.[32] Implementation The Air Force could implement this framework, or other general AI writing policy, through publications such as Air Force Manual 33-326, Preparing Official Communications and Department of the Air Force Handbook 33-337, The Tongue and Quill. The policy should set baseline standards but allow room for tailoring to specific subsets of Air Force writers, like those in the legal and academic fields. For example, attorneys might have additional requirements that could change as the American Bar Association and state bars continue studying generative AI’s effects on the legal profession and recommending best practices. Supplemental legal-specific guidance could be incorporated into rules like the Air Force Rules of Professional Conduct and Uniform Rules of Practice Before Department of the Air Force Courts-Martial.[33] Similarly, Air University and other academic units or programs may want to impose more stringent guidance, including banning AI completely for specific courses or assignments, depending on specific learning objectives.[34] Disclosure and citation requirements could be detailed in the Air University Style and Author Guide, but because the guide is not directive in nature, the requirements would also need to be articulated in Air University or other applicable policy.[35] Commanders might also want to limit or prohibit use of generative AI due to mission requirements, operational security (OPSEC) concerns, or other reasons, and should have the ability to do so.[36] While some communities might need to modify the general framework, it would set minimum requirements to ensure generative AI is implemented responsibly.[37] If the Air Force does not implement these or similar guardrails, it is foreseeable that personnel will use generative AI without disclosure and that false, hallucinatory information will seep into Air Force documents. This could create negative operational effects as well as a lack of confidence in the reliability of Air Force documents within the military and among civilian leaders and the public. Possible Alternatives A brief look at possible alternatives confirms the necessity and workability of the proposed framework. First, generative AI could be allowed in Air Force writing without disclosure. This option is unacceptable as it introduces the unnecessary risks of unreliability and non-traceability discussed earlier. This option would also lead to wasted time and effort since AI-generated text can be hard to detect.[38] The New York State Bar Association noted, “We cannot underestimate the additional cost in terms of court resources to research, verify and challenge incorrect AI-generated legal opinions and arguments.”[39] This is equally applicable in military writing, as a unit cannot function if every piece of writing must scrupulously be inspected by each reader for possible uses of generative AI. Thus, for reasons of accuracy, transparency, and economy, this option should be rejected. Another alternative is requiring a disclosure on every Air Force document stating that generative AI was or was not used.[40] While this would provide the same protection as the proposed framework, it requires more than necessary and would be out of step with general Air Force writing practices. For comparison, the Air Force does not allow plagiarism and requires citations when referencing another author’s work.[41] However, authors are not required to state on every document that it contains no plagiarism. Integrity is expected of all authors, and requiring certification that a document contains no plagiarism is unwarranted. Likewise, requiring a negative certification that generative AI was not used is unnecessary, and LLMs’ challenges can be met by disclosing and citing only in documents where generative AI was used.[42] Conclusion A framework requiring disclosure and citation of generative AI use and certification of accuracy can help ensure the Air Force integrates LLMs transparently and responsibly. It also advances US Government AI policy and DoD AI ethical principles. At this consequential time when the DoD and Air Force are energetically embracing responsible AI use, leaders should look for AI tools that are ready to employ, and these early applications will be key to creating a culture of prudent AI use within the military. The proposed framework provides a step in that direction. Major Jeremy DeLaCerda received his Juris Doctor from the University of Illinois College of Law and is an attorney in the Air Force Judge Advocate General’s Corps. The views expressed are those of the author and do not necessarily reflect the official policy or position of the Department of the Air Force, the Department of Defense, or the U.S. government. [1] “Memorandum on Advancing the United States’ Leadership in Artificial Intelligence; Harnessing Artificial Intelligence to Fulfill National Security Objectives; and Fostering the Safety, Security, and Trustworthiness of Artificial Intelligence,” The White House, October 24, 2024. See also FACT SHEET: Biden-Harris Administration Outlines Coordinated Approach to Harness Power of AI for U.S. National Security,” The White House, October 24, 2024. [2] This framework only addresses disclosing generative AI use; it does not discuss other important policy issues such as what generative AI models should be allowed for official duties or what information can be entered into them. [3] For general explanations of generative AI, see Bernard Marr, “The Difference Between Generative AI and Traditional AI: An Easy Explanation for Anyone,” Forbes, updated August 23, 2023; Adam Zewe, “Explained: Generative AI,” MIT News, November 9, 2023. [4] Henry A. Kissinger, Eric Schmidt, and Daniel Huttenlocher, The Age of AI and Our Human Future (Back Bay Books, 2023), 64; Ray Kurzweil, The Singularity is Nearer: When We Merge with AI (Viking, 2024), 43-44. [5] Kissinger, Schmidt, and Huttenlocher, Age of AI, 63-64; Kurzweil, Singularity, 40-44. [6] Kissinger, Schmidt, and Huttenlocher, Age of AI, 63, 72; Kurzweil, Singularity, 46-47. [7] Kissinger, Schmidt, and Huttenlocher, Age of AI, 11; Kurzweil, Singularity, 46-49, 64. [8] Brian Stryker and Oren Savir to Interested Parties, memorandum, subject: Nationwide Poll Findings, June 3, 2024. See also “The Value of AI in Today’s Classrooms,” Walton Family Foundation, June 11, 2024. [9] “Department of the Air Force launches NIPRGPT,” Air Force, June 10, 2024. [10] Courtney Albon, “Air Force’s ChatGPT-Like AI Pilot Draws 80K Users in Initial Months,” Defense News, September 16, 2024. [11] Kissinger, Schmidt, and Huttenlocher, Age of AI, 72-73; Kurzweil, Singularity, 65. [12] Shannon Vallor, The AI Mirror: How to Reclaim Our Humanity in an Age of Machine Thinking (Oxford University Press, 2024), 25. [13] Kurzweil, Singularity, 46-49. [14] The obligations of accuracy and truthfulness are built into lawyers’ professional duties. For example, the American Bar Association’s Model Rules of Professional Conduct forbid lawyers from “mak[ing] a false statement of material fact or law to a third person” (Rule 4.1) and require truthfulness in statements to courts (Rule 3.3). “Model Rules of Professional Conduct,” American Bar Association, 2024. [15] New York State Bar Association, Report and Recommendations of the New York State Bar Association Task Force on Artificial Intelligence (April 2024), 38, 50-51. [16]Matthew Dahl, Varun Magesh, Mirac Suzgun, and Daniel E. Ho, “Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models,” Journal of Legal Analysis (forthcoming), 6; Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Manning, and Daniel E. Ho, “Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools” (preprint, under review), 1. For summaries of these studies, see Matthew Dahl, Varun Magesh, Mirac Suzgun, and Daniel E. Ho, “Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive,” Stanford University Human-Centered Artificial Intelligence, January 11, 2024; Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Manning, and Daniel E. Ho, “AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries,” Stanford University Human-Centered Artificial Intelligence, May 23, 2024. [17] Kurzweil, Singularity, 18. [18] Jack Hardinges, Elena Simperl, and Nigel Shadbolt, “We Must Fix the Lack of Transparency Around the Data Used to Train Foundation Models,” Harvard Data Science Review, May 31, 2024; Kurzweil, Singularity, 18. [19] Will Knight, “The Dark Secret at the Heart of AI,” MIT Technology Review, April 11, 2017. [20] Deputy Secretary of Defense to Senior Pentagon Leadership, memorandum, subject: Implementing Responsible Artificial Intelligence in the Department of Defense, 26 May 2021. [21] The “unless otherwise noted” language recognizes that authors may want to intentionally include incorrect AI-generated information for the purpose of exposing its inaccuracy and ensures that if any such intentionally incorrect information is included, it will be explicitly noted in the body of the document. Requiring certification “by a human” rather than “by the author” recognizes standard military practice, where documents typically are researched and drafted by an action officer other than the person who signs them. For example, see Headquarters Operating Instruction 33-3, Correspondence Preparation, Control, and Tracking, August 26, 2022, paragraph 2.8. The proposed language gives flexibility for the signer, action officer, or other staff member to verify the information. [22] New York State Bar Association, Report and Recommendations, 51. [23] Ibid., 51-52. [24] “Authorship and AI Tools,” COPE, February 13, 2023; “Chatbots, Generative AI, and Scholarly Manuscripts,” WAME, revised May 31, 2023. [25] Arthur Tang, Kin-Kit Li, Kin On Kwok, Liujiao Cao, Stanley Luong, and Wilson Tam, “The Importance of Transparency: Declaring the Use of Generative Artificial Intelligence (AI) in Academic Writing,” Journal of Nursing Scholarship (2024): 314. [26] “NPS Guidance on Disclosing Generative AI Use in Academic Work,” Naval Postgraduate School, June 18, 2024. The NPS also recommends that authors, “explain how you have permission to use the AI tools you used and consider risk and risk mitigation of your use.” Ibid. [27] “Responsible Use of Generative Artificial Intelligence for the Federal Workforce,” U.S. Office of Personnel Management, accessed November 10, 2024. [28] “Citation, Documentation of Sources,” The Chicago Manual of Style Online, accessed November 10, 2024; Timothy McAdoo, “How to Cite ChatGPT,” APA Style, updated February 23, 2024. [29] Department of the Air Force Handbook 33-337, The Tongue and Quill, November 19, 2015, 287. [30] “Citation, Documentation of Sources.” [31] Provost and Chief Academic Officer to Naval Postgraduate School Faculty, Students, and Staff, memorandum, subject: NPS Interim Guiding Principles for Use of Generative Artificial Intelligence (AI) Tools, March 15, 2023, 1. [32] If authors are unable to locate relevant external sources, then they cannot verify the accuracy of the generated information and should therefore not include the generated material. [33] Air Force Instruction 51-110, Professional Responsibility Program, December 11, 2018, Attachment 2; Department of the Air Force Instruction 51-201, Administration of Military Justice, January 24, 2024, paragraph 19.1. [34] For a discussion of integrating generative AI into professional military education, see Patrick Kelly and Hannah Smith, “How to Think About Integrating Generative AI in Professional Military Education,” Military Review, May 2024. [35] Air University Style and Author Guide (Air University Press, April 2015), ii. [36] For example, NPS guidance states, “For security reasons, students, faculty, and staff shall not input Controlled Unclassified Information (CUI), personally identifiable information (PII), classified information, or any otherwise restricted information into generative AI tools.” Provost, memorandum, subject: NPS Interim Guiding Principles, 2. [37] The framework could potentially include the ability for commanders and directors to waive disclosure and citation requirements. However, since this waiver would introduce significant risks to the trustworthiness of written Air Force products, any waiver authority should be held at an appropriately high level of leadership. [38] For a study “show[ing] that humans cannot distinguish between tweets generated by GPT-3 and written by real Twitter users,” see Giovanni Spitale, Nikola Biller-Andorno, and Federico Germani, “AI Model GPT-3 (Dis)informs Us Better Than Humans,” Science Advances 9, no. 26 (2023). [39] New York State Bar Association, Report and Recommendations, 41. [40] For an example of what a negative disclaimer might look like, see Thomas S. Hong, “Generative Pre-Trained Transformers and the Department of Defense’s Own Generative Artificial Intelligence Large Language Model,” The Army Lawyer, no. 1 (2024). [41] Tongue and Quill, 39. [42] The need for generative AI disclosures on every document could change depending on development of broader societal norms. If a general culture develops of using generative AI without attribution, then a disclosure statement on all documents would become more appropriate. But as noted, a practice of disclosing and citing generative AI appears to be developing in broader society, which decreases the need for mandatory negative disclosures.