Turning Expert Assessments into High Quality Empirical Data

Published May 11, 2020
By Dr. Donald Gooch & Dr. Adam Lowther

Wild Blue Yonder / Maxwell AFB, AL --

Background and Challenge

Major Commands (MAJCOM) across the Army, Navy, Air Force, and Marine Corps annually examine the future risk to their operational portfolio as part of their planning, programming, and budgeting system. This is done to inform the MAJCOM commander’s strategic planning/decision-making and internal assessments of the risk to major programs. The process often relies heavily on the inputs of a small number of subject-matter experts (SME), who provide inputs to planners during what may take the form of an annual “planning conference,” or something similar. A small number of SMEs come together to both individually and collectively offer expert opinion on future risk to the command’s programs. Planners then develop an assessment of risk for each program. They often rely on the Chairman of the Joint Chiefs of Staff Manual 3105.01: Joint Risk Analysis, which lays out the Joint Staff’s approach to joint risk analysis and is widely used across the Department of Defense.^¹

Based on assessments approved by the commander, priorities and risk assessments are then provided to the service “headquarters’ staff,” where all the various commands must compete for limited resources. Through this planning process, the services develop their long-range budget objectives, comparing the requirements’ submissions of each MAJCOM, evaluating the strengths and weaknesses of the evidence provided. Absent compelling data supporting a MAJCOM’s assessment of risk, the command’s programs could be ranked lower on the service’s priority list, diminishing the resources allocated to the command’s mission set.

One challenge often faced by a command relying heavily on CJCSM 3105.01 and SME inputs is that the risk-evaluation methodology lacks empirical rigor. This then fails to provide MAJCOM commanders with sufficient confidence in the risk assessment to make high-value and high-risk assessments of future requirements. The reliance on a small number of SMEs for high-stakes risk assessments is due in large part to the limited availability of expertise, which then generates limited data—creating a replicability problem.^²

Two factors explain this challenge. First, the future-oriented nature of military risk evaluation makes it difficult to validate analysis through replication. This is a common challenge with future-oriented assessments of the strategic environment that rely on “fuzzy” data, where past performance is a poor predictor of future performance or requirements. Second, with a small number of cases to evaluate and a small number of SMEs from whom to elicit expert opinion, developing quantitative data is challenging—regardless of the quality of data provided. This challenge is in many ways similar to the fundamental challenge of effectively measuring deterrence, for example, where deterrence succeeds when an adversary takes no action—requiring the researcher to understand why an action did not occur.

Given the challenges and limitations of the current risk-assessment process, there is a more effective, rigorous, useful alternative. This article offers organizations that rely on small numbers of SMEs a set of tools and methodologies that can increase the rigor of assessments and provide more reliable, empirical data. It also seeks to provide those at the service level responsible for programmatic decisions a wider array of measurable options. In so doing, service leaders can make better decisions about risk and priorities.

Understanding Risk

Risk is defined as “a measure of the probability and consequence of uncertain future events.”^³ Economist Charles Yoe further explains, “It is the chance of an undesirable outcome.”^⁴ Risk is often described in an equation as: Risk = Probability x Consequence. The International Organization for Standardization (ISO) describes risk as the effect of uncertainty on an organization’s objectives. Each of these descriptions is useful in understanding the objectives of a command’s effort to understand risk to their operations and programs.

If risk analysis is best understood as a process for making decisions under a condition of uncertainty, then it is possible to divide this process into three components: risk assessment, risk communication, and risk management. We focus here on improving risk-assessment efforts, and it is important to briefly offer a description of this process. The process begins by asking the fundamental question, is the problem adequately defined? This is often an underappreciated step, as analysts often seek to move quickly to the solution stage. Albert Einstein famously said, “If I had an hour to solve a problem, I’d spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.” Einstein’s larger point was a poignant one. The better the problem is understood, the more straightforward the solution often becomes.

Risk Assessment

Risk assessment is a four-step process. Step one begins with an environmental scan in which both hazards (risks) and opportunities are identified. This initial step is critical because it serves as the basis for the three steps that follows. Thus, allocating sufficient time to accurately determine potential risks and opportunities is key. Step two requires the investigator perform a consequence assessment in which who or what may be harmed is determined. The form of that harm, as well as the severity of potential harm, must also be assessed. Step three takes the form of a likelihood assessment. Here, the investigator assesses the likelihood of adverse or beneficial consequences and characterizes likelihoods and their uncertainties. Step four requires the investigator characterize risk by estimating the probability of occurrence, severity of adverse consequences, and the magnitude of gains or losses.^⁵

This essential process can take a variety of forms and use varying qualitative or quantitative methods, all of which depend on the data available to the investigator. The essence of the risk-assessment process requires a systematic and thorough evaluation of possible risks and opportunities with a detailed process for characterizing and evaluating the probability and effect of risk.

Risk Communication

Risk communication is an essential step in the process because it is here that a risk evaluator assists those responsible for risk mitigation in understanding the risks to personnel and resources. Effective risk communication allows for the development of a risk-mitigation consensus. As Charles Yoe notes of risk communication:

It is an interactive process for exchanging information and opinion.
It occurs throughout the risk-analysis process.
It discusses risk, risk-related factors, and perceptions of risk.
It is related to risk assessors, risk managers, and those affected.
It includes an explanation of risks, a description of the risk assessment, and the basis for any risk-management decision.^⁶

By effectively communicating risk prior to the realization of risk (something bad happening), risk mitigation and management can play a more active central role in reducing risk and preparing those at risk for a negative outcome. This process is central to enabling stakeholders to accurately weigh the costs and benefits of a potential action.

Risk Management

Although it is impossible to completely eliminate risk, the process of managing risk can play an influential role in optimizing desired outcomes. There are five basic parts to any risk-management activity: 1) identifying problems and opportunities; 2) estimating risk; 3) evaluating risk; 4) controlling risk; and 5) monitoring risk. While this process is certainly easier said than done, it is the basis of any effort to manage risk. According to the ISO, which publishes the ISO 31000: Risk Management Guidelines (2018), risk management is decision making that evolves as uncertainty is reduced.^⁷

As the previous paragraph suggests, the risk-management process takes place in five stages. The process is designed to ensure a rigorous and systemic look that provides the most accurate evaluation of risk possible and the best approach to risk mitigation and monitoring. It is a dynamic and ongoing process that constantly updates existing information.

Stage one is focused on problem identification. Here, the problem must be recognized, defined, and accepted. Stage two, risk estimation, requires the establishment of a risk-analysis process and the design of individual risk-management activities. Stage three, risk evaluation, requires the establishment of agreed upon risk levels and the level of acceptable risk. Stage four is focused on controlling risk. This effort requires the formulation, evaluation, comparison, and selection of risk-management options. During this stage, identifying decision outcomes and implementing a risk-control strategy is key. Stage five is focused on monitoring risk and adjusting risk-management options as required by changing circumstances.^⁸

Although brief, this description of the risk-management process illustrates the need for a clearly defined and replicable process for understanding and mitigating risk. Selecting the most appropriate analytical tools for use in the risk-analysis process, our focus, is largely determined by the nature of the problem and the data available. After evaluating the types of problems commands often face, we seek to go beyond the subjective approach outlined in CJCSM 3105.01, which relies on a red, yellow, orange, and green ranking system and offer the following approach as a means of improving upon the current risk-assessment process.

Research Approach

The nature of many missions pursued by the services and their subordinate commands can make the collection of relevant quantitative data challenging because of the low number of verifiable cases from which to draw data. With service MAJCOMs also engaged in long-range planning that requires an understanding of both shorter-term strategic risk (next five years) and much longer-term strategic trends, there is an inherent duality in the risk-assessment process—long- vs. short-term risk. This is undoubtedly a challenging task, as these two planning efforts can often become blended and overlap.

A lack of data drives those responsible for risk assessment to rely on expert opinion to derive the command’s risk assessment. This too is problematic because often the number of SMEs available to use in assessment is too small to derive meaningful quantitative data. With this in mind, we paid particular attention to methodological approaches used in military and intelligence studies.^⁹ There is, however, a gap in that literature as it relates to methodologies for small-n risk assessment. We also looked to Grounded Theory as an approach that might offer useful insights.^¹⁰ Again, this approach was a suboptimal solution for our research problem.

Reliance on instruments such as a SME survey and subsequent focus groups, while providing rich qualitative insights, lacks the rigor and generalizability of systematic study. We looked to the literature in these areas to develop our methodological tools—provided below.^¹¹ We seek to offer a significant increase in analytical rigor by focusing on the development of a mixed methods approach that will increase the validity and reliability of risk assessment. This should increase the defensibility of programmatic decision making for any organization that struggles to make risk assessments with robust quantitative data.

Recommendations

Recommendations are divided into eight areas and offered in the order in which we suggest each methodology be incorporated into a larger risk assessment. The objective in selecting these methods is to provide a path for creating empirically valid quantitative data derived from qualitative measures. In combination, these methods provide cross validation and a more effective way to prioritize risk. The methods proposed also increase inputs from a larger number of SMEs, which also increases validity and reliability.

Sample Size

While obtaining additional respondents, particularly the critical mass necessary for statistical analysis, is often difficult, and perhaps prohibitive, in some instances, striving for a sufficient sample size for statistical analysis should be a high priority. A small subset of SMEs does not build consistent or reliable data, precisely due to the small number of experts participating in the risk assessment. If the expert opinion-solicitation method used by commands for future planning and described in CJCSM 3105.01 surveys the entire population of relevant experts within a subject-matter area, then the method has complete coverage and no additional respondents are obtainable. However, as is often the case, the entire relevant population is not surveyed and assessments obtained from that population constitute parameters—not statistics. When planners are developing risk assessments for a command and following the process described in CJCSM 3105.01, it is common for the SME-elicitation process to include fewer than five participants for each of the given areas of interests—often participants may include 2–3 experts. This is simply too few. While the absolute minimum for using means and means-based statistical procedures is seven respondents, the recommended minimum is 15 respondents per area.

Admittedly, finding 15 SMEs with complete knowledge of a given area is a difficult task, but finding experts for partial expertise is sufficient, because the methods provided below allow for weighting the certainty of responses. This allows a survey respondent with great knowledge in one area and less in another to weight their own confidence in their answers and make planners conducting the analysis aware of a participants’ strengths and weaknesses.

It is worth noting that the single largest problem our own experience has revealed is that all too often the number of SMEs participating in planning conferences and determining the level and type of risk a command’s programs will face is too small (2–3 per area) to derive meaningful data. Thus, we recommend creating an electronic survey that is distributed, via e-mail, to all identified SMEs. This method improves on frequently used processes in two ways: (1) it has the potential of increasing the sample size within each subject-matter area, potentially such that the necessary critical mass is achieved for statistical analysis; and (2) by having SMEs make their assessments individually, independent assessments of risk on the given systems within the subject-matter area will be obtained. Focus groups and SME interviews could provide rich qualitative insights that buttress the broader empirical analysis obtained from the larger sample.

Common Framework

Given the practical limitations many commands face (heavy workloads and manpower shortages), planning shops are frequently undermanned. This often leads planners to forgo the collection of relevant data that provides insight into the actual expertise and potential biases of those experts participating in the elicitation process. Thus, there is no common framework that is established for experts who participate in the risk-assessment process. To correct the deficiency, we suggest the following.

First, collect relevant biographical data on SMEs participating in the study. Relevant data on the background and experience of SMEs provides the risk-assessment team an opportunity to determine if there are participant characteristics that might explain survey results or possible outliers. Such data is also useful in seeking to understand trends over time, as marked changes in the background and experience of participants may assist in understanding changes in expected survey results. This data could be used to weight the risk assessments (for example, by experience) or to contextualize differences between group assessments. The weight could be derived from either the background information on experience (and then transformed into a 0–1 weight) or could be a function of the standard deviation of the assessments on a given system within the area.

Second, SME participation should be framed by a common set of definitions for key concepts, questions, and variables. Absent such a common understanding, individual experts will apply their own knowledge and experience—relying on their own understanding. With experience differing across career fields and ranks, it is understandable that even within a service assuming a common understanding is not possible. If, as we suggest later, the number of expert participants is expanded beyond a small group with very similar backgrounds, establishing a common understanding becomes even more important.^¹² One simple example is illustrative of the significant, yet subtle, misunderstandings that can occur when participants have a different understanding of a concept: the acronym “DCA.” For Airmen in the tactical fighter world, this acronym stands for defensive counterair. However, Airmen in the nuclear world understand this acronym to mean dual-capable aircraft. These are two very different conceptions of the same acronym.

Third, it is important to establish a common intelligence picture for all participants in a planning conference or risk-assessment process. This is necessary because it serves to eliminate misinformation, false perceptions, and other challenges that can lead to spurious insights. In other words, a common intelligence picture frames the problem and increases the validity and reliability of the risk assessment by reducing the probability that an expert will provide ill-informed input—minimizing the prospects for outliers that skew results. Providing all assessment participants with the same set of intelligence briefings to establish a baseline understanding of the strategic environment, prior to the solicitation of insights, is the simplest way to solve this challenge.

Fourth, establishing a common set of threats to a command’s core mission, to which expert participants frame their insights, is required. Prior to study participants evaluating risk to the command’s programs, participants must understand the specific threats and how those threats affect specific platforms to which they are assessing and assigning risk. Simply providing an intelligence brief is insufficient.

Creating a frame of reference for future threats can be accomplished one of two ways. First, it can be developed by the study staff and provided to study participants prior to solicitation of expert opinion. Second, a set of defined threats can be developed as part of the first step in a Delphi analysis described below. The first approach gives greater control over the analysis by allowing the planning staff to bound the parameters of the study and is often a faster approach to developing a common set of threats. However, there is a potential to leave possible threats out of consideration. The second approach provides a detailed understanding of future threats but requires more time from study participants and researchers. Thus, evaluating which approach to take is an important and necessary step that cannot be avoided.

To add additional rigor to a planning shop’s risk-assessment study, a modified Delphi approach may be useful and is a tool the authors have used as a way to incorporate a number of additional research methodologies that aid in obtaining quantitative measures from qualitative data. When used in combination a multi-method approach provides empirically rigorous and defensible data that allows the risk-assessment team to more accurately evaluate expert insight and evaluate risk.

Beyond the Delphi methodology, we provide four tools for improving risk assessment by SMEs and in amalgamating their assessments into a rank order of risk across the given areas and items: risk pools, pairwise comparisons, amalgamation techniques, and the confidence assessment. First, we recommend using a risk pool to assess risk within or across areas for SME elicitation. The primary benefit of this method is the introduction of scarcity into the risk assessment. Absent scarcity, there is no incentive for SMEs to prioritize their risk assessments within an area. Additionally, the risk pool permits SMEs the flexibility in assigning risk to areas beyond a simple cardinal rank ordering. Second, we recommend, where feasible, that the risk assessment include pairwise comparisons. Pairwise comparisons are widely recognized to be essential in obtaining true rank orderings across preferences. Using the pairwise comparisons, risk assessments may then be amalgamated using one or more of the discussed techniques to obtain a rank ordering of risk across the systems and areas. Finally, these assessments can be weighted (after comparisons but before amalgamation) by the self-assessed confidence SMEs have in their risk assessments.

Delphi Method

The Delphi method was developed by researchers at Project RAND during the late 1950s and early 1960s. According to Norman Dalkey and Olaf Helmer, the Delphi method seeks “to obtain the most reliable consensus of opinion of a group of experts through a series of intensive questionnaires interspersed with controlled opinion feedback.”^¹³ Eliciting “informed intuitive judgment” is particularly useful when attempting to develop foresight in an area that lacks sufficient data to apply rigorous quantitative methods.^¹⁴ As Norman Dalkey, Bernice B. Brown, S. W. Cochran note, “For a number of basic military concerns, the best information available is the judgement and knowledge of individuals. This is especially true in the assessment of long-range technological developments, and the evaluation of long-range future threats.”^¹⁵

The conduct of a standard Delphi study is straightforward. A carefully selected group of experts is provided a set of forward-looking questions that solicit judgment-based feedback. Participants are geographically separated and do not know the identity of other participants. This is done to prevent the personality or reputation of an individual participant from influencing responses. Through an iterative multi-round process, participants respond to questions, review responses, and provide revised inputs—based on the most recent feedback. Study researchers guide participants toward a consensus, requiring them to evaluate new insights during each round as they drive toward a group decision.^¹⁶ The amalgamation methods, discussed below, can be used to identify Condorcet winners, discussed below, and rank order if consensus is not achieved.

According to a series of experiments performed by RAND researchers, the optimum number of study participants fell between 15 and 17—seeing no marked improvement in average group error with additional study participants.^¹⁷ This is a significant point because it would likely require an increase in the number of study participants beyond the 2–4 SMEs that often participate in the risk-assessment process—for a given program. It would also require participants evaluate risk across mission areas—requiring an expert in one weapon system to explain and defend a position across a broader set of programs. Recent research continues to support the “wisdom of the crowd” approach that is foundational to the Delphi method and consistently shown to be superior to the individual expert.^¹⁸

In contrast to the current approach where SMEs are broken into groups of 2–4 and asked only to provide inputs on topics where they are the individuals with the greatest knowledge, the larger group of more diverse experts should provide insight across all risk areas, using the Delphi method, which would allow for the requisite number of participants, while also allowing the larger group to benefit from the input of the most knowledgeable on a given subject. It is worth noting that the Delphi method does not require every expert to have the same level of knowledge on every subject. In fact, the different perspectives and areas of expertise are central to the effectiveness of the Delphi.

The Delphi method can be modified in a variety of ways to fit the constraints of the planning staff that is conducting a command’s risk assessment. For example, all but the final round, if desired, can be performed with study participants geographically separated. Or the entire process can be done over the course of several days, with all participants in one location and guided by a facilitator. These modifications of the Delphi are often used when time is a significant constraint. What does not differ across variations to the Delphi is the need for a survey instrument that is properly designed. Here again, there are a number of options available. The following sections offer some methodological approaches to the survey design that are specifically designed to draw out focused expert opinion that is most relevant to understanding risk.

Risk Point Pool

The SME-elicitation process has, as its fundamental goal, an assessment of priorities, in terms of risk, by those with particular areas of expertise. This assessment ideally provides a valid and reliable rank ordering of risk that can form the basis of budget recommendations. These experts serve as respondents to a survey on priorities and the risk associated with funding or not funding them at a particular level. They have the task of communicating relative risk across multiple priorities within an area of responsibility. Unlimited categorical risk assessment has poor reliability and validity in risk assessments. This is because respondents need make no relative assessments in assigning risk.^¹⁹ All items can be rated as equally high; thus, no rank ordering of the relative risk across the items need be made. Thus, the risk pool introduces scarcity into the risk assessment—requiring SMEs to distribute the scarce risk points across the evaluated systems within (or across) their areas and, thus, prioritize the higher risk assessments when making their evaluation.

While simple orderings, like assigning ranks, are relatively easy to produce, they may lack important information as to the relative rank-to-rank weight of particular items. Further, these rankings do not reflect other factors, such as the confidence a respondent has in their ranking or rank order. A cumulative assignment from a pool of votes, a form of cumulative voting, permits respondents to assign greater or lesser risk values across the multiple items within their area of responsibility with a defined scarcity of risk. This forces respondents to make relative assessments between the items that provide a completer and more accurate picture of their assessed risk.^²⁰

Cumulative voting is a method whereby a voter must assign votes to a list of candidates and may distribute their votes among those candidates as they see fit. It is used in some local elections in the United States but is most prominently featured in stockholder elections to corporate boards of directors.^²¹ In the case of a corporate board, cumulative voting provides common stockholders a number of votes equal to the number of shares a stockholder possesses times the number of directors to be elected. A stockholder may cast all votes for a single director or distribute them among multiple nominees.^²² It works similarly in a local council election, except without the complication of weighting the votes by shares held.^²³ In the local council election, all voters have an equal number of votes. However, cumulative voting allows them to distribute those votes unequally across candidates. In both instances, the motivation is to permit a fuller expression of voter preferences among the alternatives and to promote minority representation.^²⁴ Using cumulative voting to assign risk from a risk pool by SMEs likewise permits those experts to render a complete rank ordering of alternatives and to weight the risk by assigning more risk points to a particular priority.^²⁵

Let represent the number of items that will be assigned a risk value. The number of risk points in the risk pool of a SME responding on a set of potential funding priorities is equal to . To illustrate, let us assume that there are eight funding priorities SMEs are ranking. Since , the number of risk points respondents have to assign to priorities is . The number of possible risk profiles a respondent could choose from is —the number of ways a number can be permuted. Thus, with 8 priorities, there are 40,320 possible risk-assessment profiles the respondents may choose from. Table 1 reports on a random subset of five risk profiles drawn from the 40,320 possible combinations that constitute possible rankings a SME could give as a respondent to the survey.

Funding	Profile 1	Profile 2	Profile 3	Profile 4	Profile 5
Priority 1	1	8	4	3	4
Priority 2	1	0	0	2	0
Priority 3	1	0	0	0	1
Priority 4	1	0	0	1	1
Priority 5	1	0	0	0	1
Priority 6	1	0	0	2	0
Priority 7	1	0	0	0	0
Priority 8	1	0	4	0	1

In the first profile, the respondent rated the eight funding priorities as equally risky and, thus, assigned them each a risk point. In the second profile, however, the respondent views priority 1 as not only the most significant risk but also as a risk to the exclusion of all other given priorities. In profiles 3–5, respondents distribute their eight risk points across multiple profiles, with some priorities rated higher than others.

If the five profiles reported above were the preference ordering of five different SMEs, the cumulative risk assessment would be as follows:

Funding	Cumulative Risk	Rank Order
Priority 1	20	1
Priority 2	3	3
Priority 3	2	6
Priority 4	3	3
Priority 5	2	6
Priority 6	3	3
Priority 7	1	8
Priority 8	6	2

Note that the risk pool’s cumulative risk assessment produces a cardinal ranking (with ties permitted), yielding a relative risk assessment that both orders the priorities and assigns a magnitude to that order.

Pairwise Comparison

While the risk pool provides a cumulative ranking, it does not provide a complete picture of the relative assessment of which priorities are more important than the others. As we can see from the risk-pool example provided above, the assessment may produce a number of nominal ties among priorities. Pairwise comparisons of the alternative priorities will yield a complete and transitive ordering of the priorities against each other.^²⁶

Pairwise comparisons require the respondent to construct a complete and transitive ordering of the given alternatives. Consider a scenario where respondents rank order three funding priorities—labeled , , and . The SME ranks each priority, when compared to the other, as a higher risk, a lower risk, or an equal risk.^²⁷ An example survey question eliciting this pairwise comparison would have the following form:

When thinking of priority and priority , would you:

rate priority as a higher risk than priority
rate priority as a higher risk than priority
rate the risk for priority and priority equal?

With three priorities, there are 13 possible orderings of these priorities. The enclosed parentheses indicate the alternatives:

These rank orderings can then be assigned weights. Let us look at an example where five SMEs rate the three alternatives and their associated ranks:

SMEs	Ordering	x	y	Z
SME 1	x z y	1	3	2
SME 2	x (y z)	1	2.5	2.5
SME 3	(z x) y	1.5	3	1.5
SME 4	y x z	2	1	3
SME 5	y z x	3	1	2
average rank		1.7	2.1	2.2

We can then average the rankings from the SMEs, yielding an order with magnitude and differentiation of priorities.^²⁸

This average ranking is not the only method for amalgamating individual rankings over alternatives. Beginning with the debate over voting procedures over two centuries ago between the Marquis de Condorcet and Chevalier Jean-Charles de Borda, students of voting have considered the problem of aggregation of individual preference expressions into a collective choice or assessment. One of the problems that has received particular attention is the establishment of criteria for assessing aggregation methods. The search for criteria has focused on delineating between methods of aggregation that yield arbitrary results and those which reflect a true collective assessment. For example, the Condorcet paradox, identified by the Marquis de Condorcet in his Essay on the Application of Analysis to the Probability of Majority Decisions (1785), is a phenomenon of pairwise comparisons whereby collective preferences over a set of alternatives can be cyclic, even when the individual preferences of the voters were not cyclic. It is paradoxical because one implication of this fact is that majorities produced from the same voters, with the same preferences, can be in conflict with one another. Many such methods of amalgamation are available for use. There are strengths and weaknesses associated with different methods.

Confidence Assessment

Another factor in risk assessment is the confidence of the respondent in the risk assessment. One option is to weight risk assessments by the confidence of the respondent in their assessment. A possible method for eliciting this confidence assessment is a thermometer rating of the confidence of the respondent.^²⁹ An example of a confidence-feeling thermometer question follows:

Thinking about your risk rating on priority x, rate your confidence in this risk assessment from 0 to 100. Ratings between 50 and 100 mean that you feel confident in the assessment, while ratings between 0 and 50 mean that you do not feel confidence in the assessment.

The feeling-thermometer confidence ratings can then be converted to weights and applied to the risk-pool allocation or the pairwise rankings. We can convert the thermometer score into a weight that ranges from 0 (no confidence) to 1 (high confidence). We then multiply the risk score by the weight. This has the effect of reducing the impact of risk assessments with lower confidence levels relative to those made with higher confidence. The summative or average function of the risk assessment is thus adjusted by confidence when the weights are deployed. Consider the following example, where one SME gives their risk assessments by assigning risk points to the funding priorities, with the weights of their confidence applied to those risk pool allocations:

Funding	Profile 1	Weight	Weighted Risk
Priority 1	2	.68	1.36
Priority 2	1	1	1
Priority 3	0	1	0
Priority 4	3	.85	2.55
Priority 5	0	1	0
Priority 6	0	.38	0
Priority 7	2	.45	0.9
Priority 8	0	1	0

In general form, the formula for applying the weight is:

metric x weight = weighted metric

In this case, a summation of the risk points on a particular attribute or priority over n SME respondents, where r is the collective risk assessment, and rp_wis the weighted risk points in a given attribute for a SME respondent, would be:

Cross-Group Assessment

While the purpose of SME elicitation is to take advantage of expert assessment of risks in the SMEs’ areas of responsibility, the assessment may be confounded by implicit bias in favor of the individual SME’s area of expertise, or their assessment may suffer from a siloing effect simply due to the specialization and narrow focus of the SME. If only SMEs within an area are consulted on the risk assessment of a given priority, there is no way to estimate the effect of this potential bias on the individual risk assessments, and this effect would be compounded in any cumulative risk assessment.

One control on siloing in assessments is to have all respondents rank order the priorities outside of their areas of expertise. If the SME has sufficient knowledge and expertise to differentiate between priorities outside their areas of expertise, then a cross-group assessment would provide a check on implicit bias toward the SME’s own area of expertise. For example, if a SME’s knowledge is focused in an area that he rates as having high risk but all other SMEs outside of that area give it a low risk rating, the cumulative ranking of that priority could then be adjusted downward. Of course, this can also work in reverse. The procedure for producing such results is fairly straightforward: ask SMEs to rank all the funding priorities outside of their own area of expertise, to the exclusion of those priorities in their area of expertise.

This ranking should be done individually (or collectively within an area of expertise), without input from or knowledge of the ranking by other SMEs outside their own area. This will ensure against SME bias toward their own priorities. Here, rankings should be ordinal, mutually exclusive ranks. Additionally, the lowest rating of a priority should be eliminated from consideration and cumulative calculations. SMEs should be informed of this prior to making their assessments. This is to guard against SME “sandbagging” or “gaming,” whereby low-risk ranks are assigned across the board or to a high-risk priority in competition with the SME’s favored priority, thereby gaming the rating to ensure that their own area of expertise is scored relatively higher. If SMEs know that the lowest rating will be discarded, this disincentives low balling a high-risk priority in another area of expertise.

Conclusion

Developing empirical data for senior leader risk planning and decision making is often a challenge because of the ever-present shortage of information. Planning shops at MAJCOMs across the military services often find themselves relying on a small number of SMEs to provide the required inputs for developing highly impactful future-oriented risk assessments. This study sought to offer these planning shops a number of ways to improve the quality of their risk assessments by increasing the number of SMEs participating in the elicitation process and then converting the subjective information into empirical data that offers increased rigor and validity. Additional tools like risk-point pooling, pairwise comparisons, and cross-group assessments were offered as potential options for moving beyond the Delphi method and a basic improvement to the SME-elicitation process.

Implementing the above recommendations has the potential to improve both reliability and validity—enabling senior leaders tasked with making major decisions the opportunity to do so with greater confidence in the risk assessments they are provided. Understandably, the desire to improve the quality of risk assessments is an important endeavor for any military commander. When the future is uncertain and there is limited information and low levels of certainty of the future, it can be even more challenging to effectively assess risk. The same is true when few individuals understand a command’s programs and mission. These challenges are all too common for MAJCOMs that may have one or two personnel on whom the command must rely for expert insight. This article seeks to rectify that challenge in a small way.

Dr. Donald Gooch

Dr. Gooch (BA, political science and sociology, University of Central Arkansas; MA, political science, University of Arkansas; and PhD, political science, University of Missouri) is an associate professor in the Department of Government at Stephen F. Austin State University. His research agenda includes political polarization, behavior on the Supreme Court, campaign finance regulation, civic education, formal theory, and the spatial theory of voting.

Dr. Adam Lowther

Dr. Lowther is a professor of political science at the US Army School of Advanced Military Studies, the former director of the US Air Force’s School for Advanced Nuclear Deterrence Studies, and the former director of the Air Force Research Institute’s Center for Academic and Professional Journals. He holds a PhD in international relations from the University of Alabama. Dr. Lowther is the author of numerous journal articles and books focused on international relations and military affairs.

Addendum

If SME respondents are not increased, and if cross-area assessments are ruled out, the current methodology may be optimal for SME elicitation. However, we strongly recommend that risk assessments from SMEs within an area utilize the median for central tendency and dispersion measures. Means are unreliable measures of both central tendency and dispersion with extremely small sample sizes. For dispersion, use the median absolute deviation as a robust measure of the variability in the risk assessment:

Where

This recommendation would hold for samples that ranged from one to fifteen.

Notes

1 William Mayville, Chairman of the Joint Chiefs of Staff Manual 3105.01: Joint Risk Analysis (Washington, DC: Department of Defense, 2019). Widely relied on, CJCSM 3105.01 provides an approach to creating a color-coded system of risk analysis that is highly subjective and cannot be replicated because it relies largely on SME inputs to assess and evaluate risk.

2 The authors have worked with commands and organizations within the services to develop more rigorous methods for evaluating risk. The article is based on that experience.

3 Charles Yoe, Primer on Risk Analysis: Decision Making Under Uncertainty (New York: CRC Press, 2012), 1.

4 Ibid., 1.

5 Charles Yoe, Principles of Risk Analysis (New York: CRC Press, 2012), 93–125.

6 Yoe, Primer on Risk Analysis, 156.

7 See A Risk Practitioners Guide to ISO 31000: 2018 (London, Institute for Risk Management, 2018).

8 Yoe, Principles of Risk Assessment, 47–92.

9 See Richard Kuglar, Policy Analysis in National Security Affairs: New Methods for a New Era (Washington, DC: NDU Press, 2006); Joseph Soeters, Patricia Shields, and Sebastian Rietjens, eds., Routledge Handbook of Research Methods in Military Studies (London: Routledge, 2014); Richard Heuer, Jr. and Randolph Pherson, Structured Analytic Techniques for Intelligence Analysis (Washington, DC: CQ Press, 2015); and Hank Prunckun, Scientific Methods of Inquiry for Intelligence Analysis (Boulder, CO: Rowman and Littlefield, 2015).

10 See Barney Glaser and Anselm Strauss, The Discovery of Grounded Theory: Strategies for Qualitative Research (New Brunswick, NJ: Transaction, 2008); Melanie Birks and Jane Mills, Grounded Theory: A Practical Guide (Los Angeles: Sage, 2015); and Juliet Corbin and Anselm Strauss, Basics of Qualitative Research (Los Angeles: Sage, 2015).

11 David Harris, The Complete Guide to Writing Questionnaires (Durham, NC: I and M Press, 2014); Richard Krueger and Mary Anne Casey, Focus Groups: A Practical Guide for Applied Research (Los Angeles: Sage, 2015); and Bob Hancke, Intelligent Research Design (Oxford: Oxford University Press, 2010).

12 Our own experience suggests that those conducting risk assessments for a command too often dismiss potential participants because they are not “the” expert, which limits the validity of the assessment.

13 Norman Dalkey and Olaf Helmer, An Experimental Application of the Delphi Method to the Use of Experts (Santa Monica, CA: RAND Corporation, 1962), 1.

14 Norman Dalkey, The Delphi Method: An Experimental Study of Group Opinion (Santa Monica, CA: RAND Corporation, 1969).

15 Norman Dalkey, Bernice B. Brown, S. W. Cochran, The Delphi Method IV: Effects of Percentile Feedback and Feed-In on Relevant Facts (Santa Monica, CA: RAND Corp., 1970), iii.

16 Adam Lowther, The Asia-Pacific Century: Challenges and Opportunities (Maxwell AFB, AL: Air University Press, 2012), 26–29.

17Olaf Helmer, Analysis of the Future: The Delphi Method (Santa Monica, CA: RAND Corporation, 1967), 5.

18 Joaquin Navajas, Tamara Niella, Gerry Garbulsky, Bahador Bahrami, and Mariano Sigman, “Aggregated Knowledge from a Small Number of Debates Outperforms the Wisdom of Large Crowds,” Nature Human Behavior 2, no. 2 (2018): 126–32.

19Michel Regenwetter, “Perspectives on Preference Aggregation,” Perspectives on Psychological Science 4, no. 4 (July 2009): 403–07.

20 Lewis R. Mills, “The Mathematics of Cumulative Voting,” Duke Law Journal 17, no. 1 (February 1968): 28–43.

21 Sanjai Bhagat and James A. Brickley, “Cumulative Voting: The Value of Minority Shareholder Voting Rights,” Journal of Law & Economics 27, no. 2 (October 1984): 339–65.

22 Gerald Glasser, “Game Theory and Cumulative Voting for Corporate Directors,” Management Science 5, no. 2 (January 1959): 151–56.

23 Richard Cole, Delbert A. Taebel, and Richard L. Engstrom, “Cumulative Voting in a Municipal Election: A Note on Voter Reactions and Electoral Consequences,” Western Political Quarterly 43, no. 1 (March 1990): 191–99.

24 David Brockington, Todd Donovan, Shaun Bowler, and Robert Brischetto, “Minority Representation under Cumulative and Limited Voting,” Journal of Politics 60, no. 4 (November 1998): 1108–25.

25 Jeffrey Gordon, “Institutions as Relational Investors: A New Look at Cumulative Voting,” Columbia Law Review 94, no. 1 (January 1994): 124–92.

26 H. P. Young, “Condorcet’s Theory of Voting,” American Political Science Review 82, no. 4 (December 1988): 1231–44.

27 William Riker, Liberalism vs. Populism: A Confrontation between the Theory of Democracy and the Theory of Social Choice (San Francisco: Freeman, 1982).

28 James Enelow and Melvin J. Hinich, The Spatial Theory of Voting: An Introduction (Cambridge: Cambridge University Press, 1984).

29 Duncan Black, “On the Rationale of Group Decision Making,” Journal of Political Economy 56, no. 1 (1948a), 23–34.

Wild Blue Yonder Home