WaPo columnist and CMS official provide dueling narratives from ChatGPT Health

0 1 5 minutes read

WaPo columnist and CMS official provide dueling narratives from ChatGPT Health

Since OpenAI announced that people could join a waitlist, upload medical data to the ChatGPT Health beta and query the chatbot, dozens of people have done so.

Among them are Washington Post technology columnist Jeffrey Fowler and the daughter of Amy Gleason, acting administrator of the U.S. Office of the Governor and strategic advisor to the Centers for Medicare and Medicaid Services, who is battling a rare disease. Their experiences with ChatGPT Health (shared online and at live events this week) were polar opposites in terms of the accuracy of the online bot's claims.

On Monday, Fowler wrote a lengthy narrative about how he joined a waitlist to use ChatGPT Health, then uploaded a decade of step and heart measurements (29 million steps and 6 million heartbeats) that were collected by his Apple Watch and stored in the Apple Health app. Fowler then asked the health robot a simple question: “Give me a simple score (AF) of my cardiovascular health over the past ten years, including component scores and an overall assessment of my lifespan.”

He got an F. ChatGPT Health declined to say how long he might live. Each time he uploads the same information, he gets a different score.

This story is a fun read and should be read by everyone. Fowler reported to his doctors and other prominent cardiologists, such as Dr. Eric Topol, an advocate for physicians adopting the new technology. Both said ChatGPT Health was dead wrong and that Fowler was in good health. The message of this story is clear: These products are being launched before they are ready and have the potential to cause real harm to patients.

If you read the story further, Fowler said the bot actually said the scoring was based solely on Apple Watch data, and that it might have provided a more useful score if he also uploaded his medical records. He did so, and his grade went from an F to a D.

Apparently some of the analysis was based on “an assessment of the Apple Watch's VO2 max measurement, which is the maximum amount of oxygen your body can consume during exercise,” and the way Apple measures VO2 seems inadequate. ChatGPT Health also looked at other fuzzy measures. In other words, it focuses on the wrong things, hence the F and D grades. Anthropic's Claude reportedly isn't much better.

Later, Fowler's personal physician wanted to further evaluate his heart health and ordered a blood test that included a measurement of lipoprotein(a). The test measures specific types of fat-carrying particles in the blood to better assess cardiovascular risks beyond cholesterol and may uncover hidden risks of heart attacks, strokes and atherosclerosis. Fowler noted that neither ChatGPT Health nor Crowder advised him to do this — a valid point since the bot gave his health such a low rating. However, one might ask, “Is this test necessary?” After all, as Fowler himself noted, his doctor's response to the F rating was that he was “at low risk for a heart attack, and my insurance probably wouldn't even pay for an additional cardio test to prove the AI wrong.”

Did the doctor order the test out of caution, or to reassure him?

Additionally, Fowler noticed troubling signs in his interactions with ChatGPT Health. Currently, we worry about hallucinations in artificial intelligence—software seeing things that aren’t there. Fowler Reports Aging – ChatGPT Health Forgot his age, gender, and even his most recent vital signs.

All in all, Fowler and his sources appear to conclude that the tools were not developed to “extract accurate and useful personal analysis from the complex data stored in Apple Watch and medical charts.” In a word, they are disappointing and consumers should be aware of this.

For a contrasting experience with ChatGPT Health, we turned to Gleason of DOGE and CMS. Gleason has a nursing background and her daughter has been battling a rare disease for years. Gleeson spoke about CMS' health technology ecosystem Tuesday at an event in San Francisco organized by health data intelligence company Innovaccer.

She shared the heartbreaking story of her daughter, a cheerleading gymnast, who went from doing somersaults and falling to breaking bones just walking and ending up unable to stand up or walk up stairs. A year and three months later, a skin biopsy revealed her true disease: juvenile dermatomyositis, a systemic vascular disease that is a rare, chronic autoimmune disease in children in which the immune system attacks blood vessels, causing muscle inflammation and rashes. Gleason's daughter was about 11 years old at the time.

“She's been taking 21 medications a day and infusions twice a month for 15 years, so she's very excited about this CAR-T trial because it will eliminate all her medications,” Gleason told the audience.

But disappointment awaited Morgan, now 27.

“So she went to trial, [but] “They turned her away because she had overlapping ulcerative colitis,” Gleason said. “They said it was too risky to take her off all her medications. She might have adverse reactions to her UC.”

Morgan was so frustrated that she gathered the vast collection of medical records Gleason had collected over the years and uploaded them to ChatGPT Health. She asked the healthbot to “find me another trial,” and ChatGPT found her the exact same CAR-T trial, but with important information.

“ChatGPT said, I actually think you're eligible for the trial because I don't think you have ulcerative colitis. I think you have a slight deviation called micronode enteritis, which is a slower-responsive form of colitis, and it's not excluded from the trial,” Gleason said.

Apparently, ChatGPT doesn’t stop there.

“It was also discovered in her records that when she had her tonsils removed – as we were going through the one-year and three-month journey – she found information on the tonsil biopsy to 'evaluate for autoimmune disease,' but no one had seen it and it was completely missed during her process,” Gleeson said.

Impressed with her interactions with ChatGPT Health, she added, “Providers that adapt to this world are going to do well and survive, and those that resist it and try to prevent patients from using it are going to miss out.”

Sitting to her right during the panel was Dr. Robert Wachter, physician, author, professor and chairman of the Department of Medicine at the University of California, San Francisco (UCSF). Dr. Wachter shared Fowler's experience above to offer some warnings for consumers using artificial intelligence.

“So these tools are useful and beneficial in many ways, but I think ultimately the patient-facing tools will be more patient-specific than a generic ChatGPT or a generic Open Evidence,” he said.

Maybe Gleason has the final word on this.

“I also think today is the dumbest these models have ever been,” she said. “So they will continue to get better over time and I think they should definitely be used in conjunction with today's providers.”

Photo: Olena Malik, Getty Images