How AI Chatbots Help Doctors Make Better Diagnoses

How AI Chatbots Help Doctors Make Better DiagnosesArtificial intelligence (AI) has many potential benefits, along with risks. One area where artificial intelligence is already having a positive effect is in the practice of medicine. AI is helping physicians have a better bedside manner with their patients, providing better documentation, recommending optimal medical staff levels, and even drafting readable medical notes.

AI’s ability to provide a quality medical diagnosis – a case study

One area where AI chatbots are especially useful is in diagnosing illnesses. According to a New York Times report, a recent study found that ChatGPT (an artificial intelligence platform) outperformed physicians in “assessing medical case histories, even when those doctors were using a chatbot.” The study is just one factor supporting the possibility that doctors and hospitals may be liable for medical malpractice if they fail to use AI to make a medical diagnosis.

The New York Times story discussed an experiment conducted at Beth Israel Deaconess Medical Center in Boston. In the study conducted by Dr. Adam Rodman, an expert in internal medicine, ChatGPT-4 (an OpenAI company) had a 90 percent success rate in diagnosing a patient’s medical condition from a case report. Doctors who made their own diagnoses had a 74 percent success rate. Doctors who used ChatGPT-4 had a 76 percent success rate.

The study may show that some physicians are downright stubborn and stuck with their diagnosis even when a chatbot suggested a potentially better one. Additionally, the report shows that many physicians need more experience to better use the full abilities of chatbots to solve complex diagnostic problems and offer explanations for their diagnoses.

How the AI diagnostic study was conducted

The results of the study, which involved 50 doctors, were published in the journal JAMA Network Open. Each doctor was given six case histories. The doctors “were graded on their ability to suggest diagnoses and explain why they favored or ruled them out. Their grades also included getting the final diagnosis right.” The medical experts who graded the results did not know who was making the diagnosis – a doctor, ChatGPT, or both.

The case histories were based on 105 actual cases that weren’t reported and thus couldn’t be accessed by the medical community or by ChatGPT.

In one illustrative test case, a 76-year-old patient had:

Severe pain in his lower back, buttocks, and calves when he walked. The pain started a few days after he had been treated with balloon angioplasty to widen a coronary artery. He had been treated with the blood thinner heparin for 48 hours after the procedure.” The man complained that he felt feverish and tired. His cardiologist had done lab studies that indicated a new onset of anemia and a buildup of nitrogen and other kidney waste products in his blood. The man had had bypass surgery for heart disease a decade earlier.

The illustrative case also included the lab results and the details of the patient’s physical exam.

“The correct diagnosis was cholesterol embolism — a condition in which shards of cholesterol break off from plaque in arteries and block blood vessels.”

The participants in the test (human, ChatGPT, and both) were required to provide three possible diagnoses – and supporting reasons. The study also required that the participants, for each diagnosis, provide findings in support of and contrary (or lacking) to each diagnosis. The test then required that the participants select a final diagnosis – and identify up to three additional steps they would implement after the diagnosis.

Why AI outperformed physicians in the diagnostic study

The researchers then tried to analyze why the AI programs did better than the human participants. Their analysis is as follows:

  • More study is necessary to show how physicians actually make decisions. Answers like “my intuition” or “my experience” aren’t very helpful.
  • AI seems to be a game-changer. Prior programs such as INTERNIST-1, a study conducted in the 1970s using computer code, found that though the diagnostic results were strong, the program was not user-friendly. More importantly, doctors didn’t trust INTERNIST-1. Doctors have been distrustful of many more recent computer programs, too.
  • Unlike prior computer programs, ChatGPT does not try to mimic a doctor’s thinking. “Their diagnostic abilities come from their ability to predict language.”
  • Doctors in the study who did use ChatGPT “Didn’t listen to A.I. when A.I. told them things they didn’t agree with.”
  • Many of the doctors were unable to use a chatbot to its fullest extent. They were using ChatGPT like a search engine instead of copying/pasting “the entire case history into the chatbot” and then asking ChatGPT for a complete answer.

Why the failure to use AI in making a diagnosis may be medical malpractice

As AI becomes more user-friendly and more trustworthy, the failure of doctors to use AI, such as ChatGPT, may be grounds for a medical malpractice lawsuit. The results of an AI study revealing the correct diagnosis could serve as evidence that the doctor’s missed diagnosis or incorrect diagnosis caused a patient’s injuries or prevented the patient from receiving proper medical care.

Whether AI will be allowed in medical malpractice claims is an open issue. States and Washington DC could pass legislation allowing the introduction of AI in medical malpractice cases. Alternatively, judges, as they’ve done for centuries, may allow the introduction of AI on their own – to show that a doctor failed to use available and reliable medical tests to provide a proper diagnosis. Medical boards may also, in time, recommend using AI to conduct a medical diagnosis.

As AI develops, the legal and medical community will determine whether AI use is a proper standard of medical care.

Please contact Paulson & Nace, PLLC, through this contact form or by calling our office.