Written by Ty Skyles, BS candidate, Brigham Young University; and participant in the 2024 Institute for Public Health Summer Research Program
We are living in the information age, and data is collected about everything. Institutions collect massive amounts of data about everything from our favorite food purchases to our most visited websites. These data help institutions make informed decisions and hospitals are no exception. Washington University collects massive amounts of health data to track the spread of disease, provide doctors good patient information, and conduct research. Researchers have been able to use big-data to gain important insights about causes, risks, and treatments for cancer, but in the research world, more is always better. Doctors record massive amounts of valuable data in their clinical notes, but it’s hard to use because computers can’t analyze narrative text. During my time in the Summer Research Program – RADIANCE Track, I have been working with Adam Wilcox, PhD, studying ways to use large language models to extract useful data from clinical notes and add it to a cancer registry.
Washington University has developed its own HIPAA secure version of ChatGPT 3.5, and it has potential to aid in clinical data extraction. It is able to analyze tens of thousands of notes in just a few hours with minimal cost. It would take a real person weeks to extract that much data. The only downside is that GPT is not nearly as accurate as a real person, but it is a lot cheaper. We have been testing the limits and learning how it can be used more accurately.
Our study has yielded interesting results. We have been able to read pathology notes for cancerous margins in samples with 95 percent accuracy. We have also been able to accurately interpret a patient’s pain with 93 percent accuracy. We also tried to use it to interpret radiology reports for tumors and it failed spectacularly. ChatGPT cannot interpret clinical notes perfectly in any circumstance, but there is an acceptable error rate for use in clinical research. In the next phase of our study we are going to use these data to improve the quality of cancer research.
In conclusion, there are a few obstacles to overcome, but large language models have tremendous potential to change the way we collect clinical data and do medical research.