Dr. AI will see you now.
It might not be that far from the truth, as more and more physicians are turning to artificial intelligence to ease their busy workloads.
Studies have shown that up to 10% of doctors are now using ChatGPT, a large language model (LLM) made by OpenAI — but just how accurate are its responses?
WHAT IS ARTIFICIAL INTELLIGENCE (AI)?
A team of researchers from the University of Kansas Medical Center decided to find out.
“Every year, about a million new medical articles are published in scientific journals, but busy doctors don’t have that much time to read them,” Dan Parente, the senior study author and an assistant professor at the university, told Fox News Digital.
“We wondered if large language models — in this case, ChatGPT — could help clinicians review the medical literature more quickly and find articles that might be most relevant for them.”
For a new study published in the Annals of Family Medicine, the researchers used ChatGPT 3.5 to summarize 140 peer-reviewed studies from 14 medical journals.
Seven physicians then independently reviewed the chatbot’s responses, rating them on quality, accuracy and bias.
The AI responses were found to be 70% shorter than real physicians’ responses, but the responses rated high in accuracy (92.5%) and quality (90%) and were not found to have bias.
Serious inaccuracies and hallucinations were “uncommon” — found in only four of 140 summaries.
“One problem with large language models is also that they can sometimes ‘hallucinate,’ which means they make up information that just isn’t true,” Parente noted.
CHATGPT FOUND BY STUDY TO SPREAD INACCURACIES WHEN ANSWERING MEDICATION QUESTIONS
“We were worried that this would be a serious problem, but instead we found that serious inaccuracies and hallucination were very rare.”
Out of the 140 summaries, only two were hallucinated, he said.
Minor inaccuracies were a little more common, however — appearing in 20 of 140 summaries.
“We also found that ChatGPT could generally help physicians figure out whether an entire journal was relevant to a medical specialty — for example, to a cardiologist or to a primary care physician — but had a lot harder of a time knowing when an individual article was relevant to a medical specialty,” Parente added.
Based on these findings, Parente noted that ChatGPT could help busy doctors and scientists decide which new articles in medical journals are most worthwhile for them to read.
“People should encourage their doctors to stay current with new advances in medicine so they can provide evidence-based care,” he said.
Dr. Harvey Castro, a Dallas, Texas-based board-certified emergency medicine physician and national speaker on artificial intelligence in health care, was not involved in the University of Kansas study but offered his insights on ChatGPT use by physicians.
“AI’s integration into health care, particularly for tasks such as interpreting and summarizing complex medical studies, significantly improves clinical decision-making,” he told Fox News Digital.
“This technological support is critical in environments like the ER, where time is of the essence and the workload can be overwhelming.”
Castro noted, however, that ChatGPT and other AI models have some limitations.
“Despite AI’s potential, the presence of inaccuracies in AI-generated summaries — although minimal — raises concerns about the reliability of using AI as the sole source for clinical decision-making,” Castro said.
“The article highlights a few serious inaccuracies within AI-generated summaries, underscoring the need for cautious integration of AI tools in clinical settings.”
Given these potential inaccuracies, particularly in high-risk scenarios, Castro stressed the importance of having health care professionals oversee and validate AI-generated content.
The researchers agreed, noting the importance of weighing the helpful benefits of LLMs like ChatGPT with the need for caution.
CLICK HERE TO SIGN UP FOR OUR HEALTH NEWSLETTER
“Like any power tool, we need to use them carefully,” Parente told Fox News Digital.
“When we ask a large language model to do a new task — in this case, summarizing medical abstracts — it’s important to check that the AI is giving us reasonable and accurate answers.”
As AI becomes more widely used in health care, Parente said, “we should insist that scientists, clinicians, engineers and other professionals have done careful work to make sure these tools are safe, accurate and beneficial.”