Manik V. Suri*

Expressing certainty matters

In the post-9/11 era, growing use of intelligence as evidence and mounting pressure for “actionable intelligence” have increased the demand for precise expression of certainty in the analyses that inform national security decisions. A series of controversial actions taken by the Bush Administration during the Iraq War buildup, for instance, relied heavily on widely cited assessments in the 2002 Iraq NIE that later proved inaccurate. Subsequent political backlash underscored the importance of carefully stating the degree of certainty in the underlying intelligence as well as analysis based on it. Over the past decade, this realization has contributed to two trends in national security policymaking: (1) the increasing adoption of numbers and probabilities, and (2) the rising use of estimative language, estimates of likelihood, and confidence levels. While both developments aim to improve decision-making by enhancing the expression of analytic certainty – and should be praised for doing so – each suffers from serious shortcomings that still need to be addressed.

Quantification remains ad hoc

First, consider the rising use of numbers and probabilities, a trend not limited to intelligence. In recent years, American officials have employed quantitative metrics to design, implement, and evaluate controversial programs including prolonged detention and targeted killing. Indeed, policymakers often justify such measures with utilitarian arguments that rely on quantitative reasoning. Former Bush Administration official Philip Zelikow, for example, has argued that enhanced interrogation demands “objective appraisal” based on data-driven analysis of the “scientific knowledge on ways to elicit information from captives,” so that its costs and benefits are “better and more professionally understood.”

Notwithstanding compelling moral objections, the invocation of quantifiable “data” to defend such policies suffers from a more prosaic shortfall: numeric and probabilistic analysis have not been systematized in national security decision-making and their use remains largely ad hoc. As commentators note, long-standing resistance to “quantitative rigor” within institutions like the CIA and FBI is driven by a combination of “custom, culture and practice.” Some detractors contend that quantification will create a false sense of precision; others fear it will subvert human judgment. These claims have some merit. Widespread use of numeric and probabilistic reasoning is no silver bullet – as the financial industry’s recent travails make clear. But quantitative analysis can nonetheless reduce uncertainty by encouraging more precise and explicit measurement of risk.

Efforts to promote quantification in national security policymaking are likely to get a needed boost from new technologies that facilitate the expression of certainty. Consider, for example, IBM’s latest supercomputer, “Watson,” which analysts predict could eventually have national security applications. Watson’s analytical process is inherently quantitative: every output it generates contains a probabilistic estimate of its decision confidence. As policymakers rely increasingly on computational tools like Watson – and others including crime-mapping software and biometric scanners – numbers and probabilities should become more widespread in national security decision-making. Such devices cannot resolve value-laden questions about competing risks that require human judgment to determine what standards of proof and evidence are appropriate in a given context. Yet by promoting more systematic use of numbers and probabilities, emerging technologies will help ensure that public policy debate over these “risk-calculus” questions rests upon accurate and accountable analysis.

Estimative language is imprecise

A second trend to improve the expression of analytic certainty has been the increasingly robust use of estimative language to state likelihood and confidence levels. Here, too, existing efforts need improvement. Consider a widely publicized example: the 2007 National Intelligence Estimate (NIE) on Iran’s Nuclear Intentions and Capabilities. In updating its May 2005 assessment of this hot-button issue, the U.S. intelligence community was keen to avoid the pitfalls of its widely criticized 2002 Iraq NIE and acutely aware that this document would receive intense public scrutiny. Yet the 2007 Iran NIE’s cautious attempt to use estimative language in expressing the likelihood of developments or events remains flawed in several respects.

The NIE unhelpfully explains that “terms such as probably, likely, very likely, or almost certainly indicate a greater than even chance,” while “the terms unlikely and remote indicate a less than even chance that an event will occur.” Aggregating these terms obscures the distinctions between them, collapsing a highly gradated scale into a few buckets. The text accompanying the NIE’s intentionally imprecise “estimative language” chart acknowledges that it merely provides “a rough idea of the relationship of some of these terms to each other.” However, this candid admission is of little help to decision-makers and ought to have been replaced with specific numeric ranges corresponding to each term.

Furthermore, the NIE employs phrases with accepted legal meanings,such as “probable,” “sufficient,” and “we judge” but fails to define them precisely or provide a numeric benchmark to which they can be correlated. Consequently, legal policymakers who read these terms with prior background assumptions in mind are likely to ascribe to them different meanings than non-legal readers might, thereby obfuscating analysis. Finally, the NIE’s use of qualitative terms to express two dimensions of certainty (estimate of likelihood and confidence level) at times risks incoherence. The first sentence of Section B states, for instance: “We continue to assess with low confidence that Iran probably has imported at least some weapons-usable fissile material….” These contravening qualifications offer the illusion of precision while providing no meaningful conclusion upon which to act.

Adopting an integrated legal-scientific scale of analytic confidence

To address these shortcomings, national security policymakers should consider adopting a new scale of analytic confidence that integrates legal and scientific standards of proof. One compelling model advanced by Georgetown scholar Charles Weiss consists of “an 11-point scale of scientific certainty ranging from ‘beyond all doubt’ (scale value of 10) at one extreme, to ‘impossible’ (scale value of 0) on the other,” with nine intervening scale values based on legal standards of proof applied in different situations under U.S. law. Crucially, this integrated scale relates corresponding standards of proof understood by three key constituencies: scientific experts, lawyers, and laypersons. Imported into the national security context, it would provide a lingua franca for analysts, policymakers, and the broader public to engage in more accurate (and constructive) discourse. This common estimative framework is especially needed today, given the increasingly inter-disciplinary approach toward threat prevention – with TSA “Behavior Detection Officers” and LAPD “predictive policing” cops combining insights from anthropology, psychology, and biology to achieve their objectives. A standard lexicon for estimation and standards of proof is essential for sharing and interpreting analysis across such initiatives.

This type of integrated scale does have drawbacks. As with any scale, it retains a fundamental element of subjectivity because it depends upon an individual to “characterize his or her judgments regarding uncertainty in a particular situation.” And even “standardized” estimative terms remain susceptible to systematic bias. For example, policymakers could consistently over-weight numbers in both directions: an analytic expression of “8” might be treated as “completely certain,” while a “2” viewed as “impossible.” This systematic bias could be addressed by developing general interpretive guidelines such as, “a confidence level of 8 or higher can be relied upon, 4-7 requires additional data, and 1-3 cannot be relied on.”

Of course, analysts may be prone (or pressured) to fudge their confidence levels to meet arbitrary cutoffs that support a desired outcome – but the use of numbers will actually promote accountability. By providing an explicit metric to track intelligence estimates against actual outcomes, quantification will allow managers to gauge analysts’ accuracy (say, by measuring the average difference between their stated confidence levels and actual outcomes over time). Supported by careful implementation, an integrated numeric scale of analytic confidence would thus mark a significant improvement on existing imprecise estimative language and enhance the quality of national security decision-making.

“Non-attributable” threats: a growing need for intelligence as evidence

Amidst ongoing efforts to improve how certainty is expressed, the pressure to do so is rising. The United States today faces a new class of “non-attributable” threats that do not fit neatly within the conventional international law framework, which requires that “an armed attack must be attributable to the State against which self-defence is exercised.” Threats of this kind include terrorism by non-state actors and various forms of “nonexplosive warfare,” such as biochemical and cyber attacks. Their non-attributable nature raises difficult questions of international law. Moreover, it requires policymakers to rely increasingly on intelligence as evidence, in order to develop deterrents and prepare countermeasures against uncertain sources. This will heighten the demand for precise and accountable intelligence, underscoring the need to systematize quantification and standardize estimative language.

Take cyber security, for example. Policymakers are aware that the threat posed by cyber-attacks is rising in frequency and severity; some even call it the “new vulnerability.” Yet such attacks are not easily attributable (witness ongoing efforts to pinpoint the source of the Stuxnet virus that infected Iran’s industrial control systems). Furthermore, as the Google-China controversy revealed, cyber-warfare raises a serious possibility of “false flag” attacks that could unintentionally escalate conflict.

Policymakers may need to adjust their risk-calculus to respond to the unique dangers posed by such non-attributable threats. Doing so will involve making value-based judgments that warrant extensive public debate concerning, amongst other things, our nation’s tolerance for false positives and false negatives over time. Ultimately, an effective and legitimate response must be grounded in accurate intelligence about the risks we face – which demands that we redouble our efforts to improve the expression of analytic certainty underpinning national security decision-making.



*A.B. Government, Harvard College, 2005; M.Phil. International Relations, Cambridge University, 2006; J.D. Candidate, Harvard Law School, 2013 (expected).

Manik Suri

J.D. candidate, Harvard Law School, 2013.