Download Our DHN Survey Result 2024
Realize your Healthcare’s Digital Transformation journey with ScaleHealthTech Learn More

ChatGPT’s Reliability Reduced When Provided More Evidence: Study

Written by : Arti Ghargi

April 8, 2024

Category Img

Image Source: Freepik

The study examined two question formats for ChatGPT: questions without evidence and questions biased with supporting or contrary evidence.

A latest study-‘How different prompts impact health answer correctness’, conducted by scientists revealed that providing more evidence to ChatGPT actually reduces its reliability.

The study was conducted by scientists from CSIRO, Australia's national science agency, and the University of Queensland (UQ) in December 2023.

It sheds light on the accuracy of large language models (LLMs) such as ChatGPT when it comes to providing health-related information.

The study found that when presented with more clinical evidence, the accuracy of ChatGPT plummeted to as low as 28%.

Dr Bevan Koopman, CSIRO principal research scientist and associate professor at UQ, highlighted the pervasive trend of individuals turning to online tools such as ChatGPT for health information despite the known risks.

"The widespread popularity of using LLMs online for answers on people’s health is why we need continued research to inform the public about risks and to help them optimize the accuracy of their answers," Koopman said.

The Study Findings

The research, presented at the Empirical Methods in Natural Language Processing (EMNLP) conference, delved into a hypothetical scenario where individuals, often non-professional health consumers, sought answers to various health-related questions from ChatGPT.

These questions ranged from inquiries about the efficacy of zinc in treating the common cold to the effects of drinking vinegar to dissolve a stuck fish bone.

The study examined two question formats: questions without evidence and questions biased with supporting or contrary evidence.

Per the study, while ChatGPT exhibited an 80% accuracy rate when responding to questions without evidence, this accuracy dropped to 63% when evidence was provided.

Even more concerning was the sharp decline in accuracy to 28% when an "unsure" answer was permitted, the study reflected.

Caution on Integrating LLMs in Search Engines

"We're not sure why this happens. But given this occurs whether the evidence given is correct or not, perhaps the evidence adds too much noise, thus lowering accuracy," Koopman speculated.

Prof Guido Zuccon, Study coauthor and director of AI for the Queensland Digital Health Centre (QDHeC), cautioned about the integration of LLMs into major search engines, highlighting the potential generation of inaccurate health information.

The next steps for the research involve investigating how the public utilizes health information generated by LLMs.

Introduced in 2022, ChatGPT quickly became one of the world’s leading Generative AI chatbots. As of August 2023, it has a userbase of over 180.5 million users with 100 millions of them being active on a weekly basis.

However, when it comes to health queries, several studies have warned about ChatGPT’s ability to provide accurate answers to complex questions.

Previous Studies on ChatGPT’ Medical Information Accuracy

A recent study published in the British Medical Journal found that the large language models behind most of the popular AI-powered chatbots, including ChatGPT lacked sufficient safeguards or were inconsistent in preventing production of healthcare disinformation on their platform.

Another study published in McKnight's Senior Living in 2024 found that ChatGPT may be a useful tool for basic healthcare questions, but it struggles with more complex queries.

It warns that healthcare professionals and patients should be cautious about using ChatGPT as an authoritative source for medication-related information.

Similarly, a study conducted by researchers at Long Island University last year found that ChatGPT correctly answered only 10 out of 39 medical-related questions.

The findings indicated that the chatbot can produce incomplete information in some medical situations.

These studies and their findings underscore the need for further research to understand the limitations and risks associated with relying on AI-driven platforms for health information.

While AI integration in healthcare in the near future is undeniable, it's crucial to remember that current LLM technology lacks sufficient evidence to support its use in real health settings.

About Chime India

The College of Healthcare Information Management Executives (CHIME) is an executive organization dedicated to serving senior digital health leaders. CHIME includes more than 5,000 members in 56 countries and two US territories and partners with over 150 healthcare IT businesses and professional services firms. CHIME enables its members and business partners to collaborate, exchange ideas, develop professionally and advocate the effective use of information management to improve the health and care throughout the communities they serve. CHIME's members are chief information officers (CIOs), chief medical information officers (CMIOs), chief nursing information officers (CNIOs), chief innovation officers (CIOs), chief digital officers (CDOs), and other senior healthcare leaders. The CHIME India Chapter became the first international chapter outside North America in 2016 and is now a community of over 70+ members in India. For more information, please visit


Digital Health News ( DHN) is India’s first dedicated digital health news platform launched by Industry recognized HealthTech Leaders. DHN Is Industry’s Leading Source Of HealthTech Business, Insights, Trends And Policy News.

DHN Provides In-Depth Data Analysis And Covers Most Impactful News As They Happen Across Entire Ecosystem Including Emerging Technology Trends And Innovations, Digital Health Startups, Hospitals, Health Insurance, Govt. Agencies & Policies, Pharmaceuticals And Biotech.


© Digital Health News 2024