AI in Brief: Radiology Reports Reimagined

Home Blog

AI in Brief: Radiology Reports Reimagined

July 26, 2024 | Inayat Grewal, MD; Po-Hao (Howard) Chen, MD, MBA

The radiology report stands as the radiologist's primary output, rich with carefully curated medical knowledge pertaining to a specific patient's exam at a given time. However, created by humans, these reports are prone to human error and are often written with a focus on the most clinically relevant findings, potentially omitting other significant details. With the rise of AI algorithms, particularly large language transformer-based models, the radiology report is becoming a critical component in multimodal machine learning, serving as a valuable source of truth. This issue of AI In Brief is a celebration of the radiology report, delving into the transformative impact of AI on radiology reports and radiology reports on AI — exploring advancements in their creation, error correction, utilization in research, and their role in enhancing patient understanding.

Harnessing AI to Accelerate Cancer Research through Radiology Report Annotation

Ankur Arya and colleagues at Memorial Sloan Kettering Cancer Center (MSKCC) are leading the way in cancer report annotation. Traditionally reliant on human annotators for curating radiology reports, MSKCC is now implementing AI to expedite this laborious process. The team trained natural language processing (NLP) models particularly fine tune-tuned BERT models to classify key data points from reports, such as imaging scan sites, cancer presence, and status.

These AI models appear to achieve high performance based on F1 scores, effectively assisting human annotators. This reduces the time and cost of manual curation while maintaining high accuracy, thus enhancing cancer research efficiency. The structured data produced by AI facilitates comprehensive integration with genomic data, essential for advancing cancer treatment.

Evaluating Large Language Models for Automated Synoptic Reporting and Resectability Categorization in Pancreatic Cancer

In addition to report annotation, cancer reporting also saw recent advancements in a June 2024 paper in Radiology: Artificial Intelligence. Bhayana et al. investigated the use of large language models (LLMs) like GPT-3.5 and GPT-4 for automating synoptic reports and categorizing resectability in pancreatic ductal adenocarcinoma (PDAC). The research demonstrated that GPT-4 significantly outperformed GPT-3.5 in generating accurate synoptic reports from free-text radiology reports, achieving high performance with an F1 score of 0.997. Additionally, using chain-of-thought prompting strategies, GPT-4 achieved a 92% accuracy rate in tumor resectability categorization, compared to 67% with default knowledge. Surgeons reviewing AI-generated reports were more accurate and efficient, spending 58% less time compared to original reports, and found it easier to extract key information.

This study suggests LLMs can enhance clinical workflows by standardizing reporting and improving communication between radiologists and surgeons. It also highlights the necessity of supervised implementation to avoid clinically significant errors. Overall, the findings illustrate the promising applications of AI in medical imaging, particularly in improving the accuracy, efficiency, and quality of pancreatic cancer care.

Enhancing Patient Understanding: The Role of Generative AI in Radiology Reporting

More than acting as a database for concepts, the radiology report is also where patients receive firsthand the latest diagnostic findings in their own bodies. In the study "Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting," Jiwoo Park and colleagues explore the efficacy of AI-generated radiology reports. This research investigates how AI can generate reports that are not only accurate but also patient-friendly, thus enhancing communication between radiologists and patients. The study involved 685 spine MRI reports from a hospital database, which were processed through a GPT-3.5-turbo model to produce three types of reports: summary, patient-friendly, and recommendations.

The researchers assessed the quality and accuracy of the AI-generated reports, focusing on minimizing artificial hallucinations and improving patient comprehension. Radiologists and non-physicians evaluated the reports using a 5-point Likert scale. Results showed a significant improvement in patient understanding with AI-generated patient-friendly reports, scoring an average of 4.69 compared to 2.71 for traditional reports. However, the researchers also identified a 1.12% occurrence of artificial hallucinations and 7.40% of potentially harmful translations.The study underscores the importance of integrating AI to streamline radiology workflows and improve patient engagement. While recognizing the limitations, including the need for broader modality verification and better prompt engineering, the authors advocate for the cautious but optimistic use of AI in clinical settings to foster patient-centered care.

Evaluating the Efficacy of Generative Large Language Models in Detecting Speech Recognition Errors in Radiology Reports

While radiology reports as a repository of knowledge and as a communication tool, it remains a manually curated report prone to typographical, semantic, or laterality errors. In a study by Reuben A. Schmidt et al., also published in Radiology: Artificial Intelligence, the potential of advanced LLMs like GPT-4 to enhance the accuracy of radiology reports by detecting speech recognition errors is examined. Accurate communication of radiology reports is critical for patient care, yet the use of speech recognition software often introduces errors. These errors, found in 20% to 60% of reports, can significantly impact clinical decisions. Recent advances in NLP and generative LLMs offer promising solutions.

A study by Reuben A. Schmidt et al. evaluated five leading LLMs—GPT-3.5-turbo, GPT-4, text-davinci-003, Llama-v2–70B-chat, and Bard—in detecting errors in 3,233 de-identified CT and MRI reports. Perhaps unsurprisingly, GPT-4 excelled, but did so imperfectly. It outperformed other models in detecting complex errors such as nonsense phrases and internal inconsistencies.

Higher error rates were observed in longer reports, resident dictations, and overnight shifts, indicating these as key areas for AI-assisted error detection as a supplement to traditional dictation software. For more details, refer to the full study here.

Toolkits to Leverage Large Language Models for Enhanced Radiology Report Structuring

2024 has been fertile ground for the data scientists looking to kick off new projects using LLM to extract information from radiology reports. Two recent preprints, which were made public within 10 days of each other, offer useful tools to streamline this step. Daniel Reichenpfader and colleagues present new open-source approaches to structuring radiology reports using LLMs by introducing "RadEx," an end-to-end framework designed to automate discrete information extraction from radiology reports. This framework allows clinicians to describe domain specific information and create report templates that allow model improvement and enhances implementation of the system. Meanwhile, another preprint from Laura Bergomia et al, explores the integration of LLMs with existing medical datasets to improve clinical decision-making processes by converting text to structured reports and offering question-and-answer type capabilities. Their work, Radiology QA Transformer, is also open source. These advancements mark significant steps toward bridging the gap between unstructured and structured medical data, facilitating more efficient clinical research and patient care.

Po-Hao "Howard" Chen, MD, MBA | Vice Chair of AI in Diagnostics Institute, IT Medical Director for Enterprise Radiology, and Staff Radiologist | Cleveland Clinic, Cleveland, OH

Inayat Grewal, MD | Research Fellow, Musculoskeletal Radiology | Cleveland Clinic, Cleveland, OH