Every so often, one reads about an unfortunate driver who blindly followed their GPS navigation system instructions and ended up driving into a lake. (This was also depicted in a classic fictional scene from The Office, where Michael Scott keeps yelling “The machine knows where it’s going” before he and Dwight plunge into the water.)
Although it’s easy to laugh at these outcomes, they highlight the serious issue known as “automation bias” where people are more inclined to trust erroneous instructions issued by a machine, even when the evidence of their own senses (and other humans) contradicts the machine recommendations.
(For another example of over-reliance on machine “intelligence,” a lawyer in New York recently used ChatGPT to help write a legal brief. Unfortunately for him and his client, the ChatGPT-written text included numerous citations to non-existent cases. Presiding judge Kevin Castel was extremely displeased, noting that “Six of the submitted cases appear to be bogus judicial decisions with bogus quotes and bogus internal citations.” The lawyer is now facing sanctions.)
In the realm of medicine, a recent study shows that even experienced physicians can fall prey to the same problem. Drs. Thomas Dratsch and Xue Chen with colleagues in Germany and the Netherlands performed a test where they asked radiologists to interpret mammograms and assign each a score indicating their level of suspicion for breast cancer using a standard scoring system known as BI-RADS. (BI-RADS scores range from 1 to 6, with 1 = normal, 2 = benign, 3 = probably benign, 4 = suspicious, 5 = highly suggestive of cancer, and 6 = biopsy proven cancer.) Some of the radiologists were relative novices, whereas others were moderately experienced or very experienced.
The radiologists were also told they were working with a new AI system that would independently generate a BI-RADS score for each case which they could refer to when making their own diagnosis. However unbeknownst to the human radiologists, the supposed AI system was only a realistic-looking fake. In some cases, the supposed AI provided a genuine score based on the actual mammogram — whereas in other cases the supposed AI generated score was fabricated to be deliberately higher (or lower) than the truth.
All mammograms were obtained between 2017 and 2019, so the researchers knew the real-world clinical outcomes for the patients for at least the past four years, including which patients had truly benign mammograms and which patients developed breast cancer (proven by biopsy).
Drs. Dratsch and Chen found that when the radiologists were given AI-generated scores that were truthful, the human interpretations were pretty accurate and also aligned with the AI scores. However, when the AI-generated scores were deliberately wrong, the humans’ accuracy also fell to less than 20%. Even the experienced radiologists (those with more than 10 years of experience) showed a sharp drop in accuracy from 82% to 45.5% when given an incorrect AI score.
As Dr. Dratsch summarized, “We anticipated that inaccurate AI predictions would influence the decisions made by radiologists in our study, particularly those with less experience. Nonetheless, it was surprising to find that even highly experienced radiologists were adversely impacted by the AI system’s judgments, albeit to a lesser extent than their less seasoned counterparts.”
Dratsch and colleagues suggest some ways to minimize this problem. For example, the AI system could also present the human radiologist with a “confidence level” of its interpretation. Another recommendation was to use “explainable AI” systems that could display the reasoning behind their diagnoses — which isn’t always possible with current “black box” systems where not even the system designers know how the AI arrived at its conclusion.
Similarly, human physicians can try their best to arrive at their own diagnoses independently before consulting an AI assistant. The more that human physicians are aware of possible automation bias, the better they will be able to combat it.
I still believe that AI has the potential to revolutionize the practice of medicine for the better. But in the short-to-medium term, we physicians will all have to be diligent to rely on the evidence of our own eyes and trust our own brains, rather than blindly following an AI. As more AI systems reach the medical marketplace, patients should feel free to ask their doctors to what extent they are relying on AI to arrive at their diagnoses and treatment plans — and to ask questions if they are not comfortable with the proposed medical advice.
The machine does not always know where it’s going! Don’t be like Michael Scott in The Office!