As large language models (LLMs) evolve into strategic instruments, this Walker Paper proposes a pragmatic framework for evaluating when and how they can be trusted in military decision making. Adapting human trust models to the algorithmic domain, the paper advances a “Trust Triad”—Character, Competence, and Control—and surveys emerging benchmarks (e.g., ethics, fairness, safety, truthfulness, robustness, and privacy) to compare current models for military decision support. It finds no model is perfect but some are more “mission fit” than others, especially when assessed with weighted metrics emphasizing factual reliability, robustness under pressure, and ethical alignment. The study also identifies gaps in transparency and accountability evaluation and recommends developing standardized measures such as a Transparency Evaluation Score and Attribution Traceability Score. The bottom line: LLMs should augment—not replace—human judgment, and trust must be earned through measurable performance.
Author(s) • Lt Col Michael S. Perry, USAF
Year • 2026
Pages • 52
ISSN • 1555-7871
AU Press Code • WP-21