Can doctors trust “black box” algorithms? Questions like this inevitably spark lively debate. On one hand, there are plenty of commonly used healthcare black boxes—for example, the mechanism of action for lithium, standard first-line treatment for bipolar disorder, remains mysterious. Despite the complex and opaque inner workings of many medical innovations, physicians are motivated to understand the tools of the trade, asking “What’s inside the package?”
Understanding the AI black box
Collaborations between Duke University healthcare professionals and the Duke Institute for Health Innovation (DIHI) have resulted in dozens of machine learning and AI technologies being developed and integrated into clinical care. These include well-validated expert heuristics that are integrated into clinical decision support as well as difficult-to-interpret deep learning systems that run in real-time to enhance safety and quality in clinical decision making.
As projects approach clinical integration, we develop and disseminate training materials, meet with stakeholders (especially end users), and establish governance processes to ensure that AI innovations are adopted in a safe and responsible manner. Our process is participatory and grounded in human-centered design. User input and feedback are incorporated throughout the AI development, evaluation, integration, and maintenance process.
Early on, we focused on building capabilities and understanding among clinical project leaders. We held ELI5 (Explain It Like I’m 5) workshops where statistics faculty and graduate students broke down neural networks, train/test splits, and other machine learning nuances in plain English.
Along the way, we fielded all sorts of questions from physicians about “What’s inside the black box?” But as more clinical end users interacted with the AI tools we built, we quickly realized we needed something much simpler than even an ELI5 workshop. We needed more basic AI transparency—something akin to food packaging labels.
AI product labels and instruction manuals
There were two immediate problems we needed to solve. First, the hype around AI often led clinical end users to believe the tools we built were more powerful than they actually were. For example, we trained a sepsis model exclusively on structured vital sign, medication, analyte, and comorbidity data. When asked how the sepsis model works, one end user responded that the model reads all the notes looking for signs of sepsis. Hence, we needed to simply present the list of model inputs and its outputs. Second, we found that as clinical end users developed experience using an AI tool for one use, they began considering it for additional potential uses. They did not have a clear understanding of how narrowly AI tools are developed and validated. AI is typically validated for a specific decision point in a specific population of patients in a specific setting at a specific point in time. Outside those parameters, we needed to clearly warn users not to try to apply a tool beyond its intended use.
At the time we sat down to design our first label, during the summer of 2019, we were breaking new ground. AI product labels seemed obvious. Google had just published “Model Cards for Model Reporting” and, the prior year, IDx-DR requested de novo classification. Despite this progress, we could not identify a single AI product label for a healthcare tool not regulated as a device. So we got to work mocking up designs and gathering feedback from clinical stakeholders and former regulatory officials.
We ultimately built consensus around a one-page document and published examples in Nature Digital Medicine and JAMA Network Open. Now, every AI tool we build is accompanied with a “Model Facts” label with sections for: summary, mechanism of action, validation and performance, uses and directions, warnings, and other information. Over time, we found that some physicians wanted more than what we could fit on the label. So we wrote our first instruction manual for our AI mortality models.
Opening the black box
In the time since we published our first “Model Facts” sheet, research into AI transparency and interpretability has proliferated. Complex methods are developed to discern how neural networks work, what sections of images are most salient to predictions, and how human-recognizable features are used by algorithms. Calls for regulatory intervention have similarly heightened. Clinical and machine learning experts confidently claim that physicians should not be asked to use black box tools.
But most conversations about transparency neglect the basics. When physicians ask “What’s in the black box?” the first thing they need is a label. And physicians who want additional information beyond the label should be able to access an instruction manual. Both should be in plain English text with standard, recognizable sections. We need physicians to understand at a high level how these tools work, how they were developed, the characteristics of the training data, how they perform, how they should be used, when they should not be used, and the limitations of the tool. If we don’t standardize labels and instruction manuals, it won’t matter how sophisticated our explanatory techniques get. We will leave clinical end users wondering what’s inside the black box and risk eroding trust in the tool and the field of AI.
Progress is elusive
Two years after embarking on our AI transparency initiative, the goalposts haven’t moved. Most AI products still don’t have product labels or instruction manuals. Numerous health systems and AI product developers have reached out to us at Duke to discuss adopting the practice internally. Industry news outlets, like STAT, featured the practice in articles (examples here and here). But it still seems complex approaches to transparency are more mesmerizing than labels and instruction manuals.
For now, responding to physicians asking “What’s in the black box?” falls on the individual health system teams that lead AI product procurement, integration, and lifecycle management. So next time someone tries to sell you an AI product or asks you to use an AI product, start by asking “Where are the product label and instruction manual?” before going any further.
Mark Sendak, MD, MPP, Data Science & Population Health Lead, Duke Institute for Health Innovation; Christopher Roth, MD, MMCi, Vice Chair of Radiology, Clinical Informatics and Information Technology, Duke University; Director of Imaging Informatics Strategy, Duke Health; Associate Professor of Radiology; Suresh Balu, MS, MBA, Director, Duke Institute for Health Innovation; Associate Dean for Innovation and Partnerships, Duke University School of Medicine
Note: The AI “Model Facts” label suggestion summarized in this blog were discussed and presented at the October FDA public workshop on transparency of AI/ML. Video of the workshop is available on demand for those interested in learning more.
As radiologists, we strive to deliver high-quality images for interpretation while maintaining patient safety, and to deliver accurate, concise reports that will inform patient care. We have improved image quality with advances in technology and attention to optimizing protocols. We have made a stronger commitment to patient safety, comfort, and satisfaction with research, communication, and education about contrast and radiation issues. But when it comes to radiology reports, little has changed over the past century.