Mount Sinai researchers say popular AI chatbots are spreading wrong medical information.

According to new research, commonly used generative AI models such as Chatgpt and DeepSeek R1 are very easy to repeat and detail medical error messages.
Mount Sinai researchers published a study this month revealing that large language models undoubtedly accept them when fictional medical terms are inserted into the patient’s condition—and continue to produce detailed explanations for completely fabricated conditions and treatments.
Even a single term for writing can have conversations with an AI chatbot, said Dr. Eyal Klang, head of generative AI at Mount Sinai, one of the authors of the study. He and other members of the other research team found that introducing only one false medical term, such as false illness or symptoms, was enough to prompt a chatbot hallucination and produce an authoritative sound (but totally inaccurate)
Dr. Klang and his team conducted two rounds of tests. First, the chatbot only provides feeding for patient scenarios, and the second researchers added a single line of warning instructions to remind the AI model that not all the information provided may be inaccurate.
Dr. Cran said that this timely hallucination was reduced by about half.
He said the research team tested six large language models, all of which were “very popular.” For example, Chatgpt receives about 2.5 billion prompts per day. Dr. Klang notes that people are becoming increasingly exposed to large language models, for example, when simple Google searches provide Gemini-generated summary.
But the fact that popular chatbots sometimes spread health errors doesn’t mean that healthcare should abandon or reduce generative AI, he said.
Dr. Klang notes that the use of generative AI in healthcare settings is becoming increasingly common as these tools can speed up the physical labor of clinicians during ongoing burnout crisis.
“[Large language models] Basically, we can imitate our work in front of a computer. If you have a patient report and want a summary, they are very good. They are good at administrative work and can have good reasoning skills, so they can make things like medical advice. You will see it more and more. ” he said.
Obviously, novel AI forms will appear even more in healthcare in the coming years. AI startups are dominating the digital health investment market, with companies like Abridge and Ampience Healthcare surpassing unicorn status, and the White House recently released an action plan to drive the use of AI in key areas such as healthcare.
Some experts were surprised that the White House’s AI action plan did not place more emphasis on AI security, because it was the top priority of the AI research community.
Responsible AI usage, for example, is a topic that is often discussed in industry events, and organizations focusing on healthcare AI security, such as the Alliance of Health AI and Digital Medicine Associations, attract thousands of members. In addition, companies such as OpenAI and Anthropic also devote a large amount of computing resources to security work.
Dr. Klang noted that the healthcare AI community is well aware of the risks of hallucinations and is still working to mitigate harmful yields.
Moving forward, he stressed the need for better safeguards and continued human supervision to ensure safety.
Photo: Andriy Onufriyenko, Getty Images