Trust and human machine workflows
As we accelerate the automation of tasks that have been previously conducted by humans, we begin to crossover into the territory of allowing machines to make complex judgements that can have a significant impact on the world. But how do we evaluate the potential consequences of these decisions? Looking at our military past through an AI lens and assessing how political and military decisions might have changed with increased AI involvement, gives an intriguing perspective to this debate.
In the early 1980s, relations between the U.S. and the Soviet Union were strained almost to breaking point. Both sides were deploying theatre nuclear missiles, which could strike key targets within 10 minutes.
1982 saw the Soviet Union place the Oko satellite system into full operations. The system’s mission was to provide early warning for nuclear missile launches from the US. Then, in the spring of 1983, President Ronald Reagan announced the Strategic Defense Initiative (“Star Wars”) to significantly expand space-based military capability threatening to upend the already teetering balance between the Cold War powers.
As tensions escalated on September 1, 1983, the Soviet military shot down a South Korean passenger jet, Korean Air Lines Flight 007. Flying from Alaska to Seoul, the aircraft had strayed into Soviet air space. 269 people were killed, including U.S. Congressman Larry McDonald.
Three weeks later on September 26, 1983, Stanislav Petrov, a Lieutenant Colonel in the Soviet Air Defense Forces, had to make a judgement that might have changed the course of history. Petrov was responsible for monitoring the Oko mission warning system that was placed into operations the previous year. On that day, the system triggered an alert that one intercontinental ballistic missile was launched from the U.S. and quickly thereafter alerted that four more were behind it.
Petrov’s job was to report these alerts to his superiors, and he was keenly aware of the current tensions. A missile launch report would quickly move up the Soviet chain of command and most likely result in an immediate call for counter-strike.
However, Petrov did not report the alert despite the system claiming the highest confidence level; he made a judgement that the alerts were triggered in error. His only evidence was his instinct and belief that it would be highly unlikely for the U.S. to only launch five missiles. His training had convinced him that any U.S. first strike would be significant in scale.
What would a machine have done in place of Petrov?
At a high level, how do we trust the judgements of machines, or of humans for that matter? How can we provide assurance that our judgements are accurate? Undoubtedly, context is king.
In some cases, machines can make more accurate judgements than humans. A study by Jon Kleinberg in 2017, indicated that human judges were not very good at predicting a defendant’s risk of skipping bail upon release pre-trial. When implemented in a random trial, their algorithmic-based decision model showed crime reductions of up to 24.7% with no change in jailing rates compared with human judges. However, there have been recent claims that systemic racial bias is still present and may be exacerbated through the use of algorithmic-based decision making.
Often the data that goes into algorithmic-based decision making is not trustworthy. The Oko system’s “high confidence” alert was anything but. New data sources like commercial remote sensing systems such as BlackSky, Planet, IceEye, and HawkEye360, and automated processing and interpretation systems will proliferate over the next few years. This will allow governments, NGOs, and commercial entities to derive new insights and make near-real-time decisions. The challenge will shift from not having enough data to the trustworthiness of the data and subsequent analytic judgments.
Clearly, we are wading into murky waters that include trust, belief, and confidence mixed with subjective assessments similar to the one Petrov had to make.
As governments and commercial organizations build out data collection and analytics systems, we need to invest in frameworks to determine how and when to trust machine outputs. Underestimating the value of human validation on data quality and machine judgements has the potential to lead to significantly negative outcomes.
The field of IT security may provide a potential model towards maturing our notions of trustworthiness. In the early days, IT systems were small, not connected to external systems, and everyone knew their local IT administrator.
However, as IT systems became interconnected, particularly with the advance of the Internet, users, systems and data became distributed. As a result, IT security professionals were forced to develop better approaches to protect assets. Then as security threats increased in sophistication, and with the consequences of data loss, companies began to implement a zero-trust IT security model. This framework essentially assumes all IT systems and data are potentially compromised; calculates a “trustworthiness” score each time a user accesses data within the enterprise and determines whether they allow or block access.
At least conceptually, this “never trust, always verify” approach could be a good model to follow as we test and implement the automation of analytic workflows. Below is set of proposed guiding principles to consider as organizations automate their analytic activities:
- Treat all systems and algorithms that perform analytic functions as an “analyst”
- Consider all analysts (whether human or machine) as fallible and subject to bias
- Understand that all analysts (whether human or machine) possess a limited contextual frame of reference based on experience and training
- Consider all data to be potentially corrupted (either through processing errors or malign intent)
- Recognize that we live in a world composed of shades of grey that requires analysts to think in probabilities
- Analysts (whether human or machine) should endeavour to model their analytic processes and conclusions explicitly and transparently
- Analytic processes and conclusions are dynamic; more data will lead to better analytic products (but not necessarily more confident conclusions)
As we race to automate let’s not forget that, like Stanislav Petrov, implicitly trusting machine judgements may lead to poor or even disastrous outcomes. Agreeing on a set of guiding principles such as those articulated above is a critical step toward achieving better business and societal outcomes.