The study was published this week in Science and comes from a research team led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. The researchers said they conducted a variety of experiments to measure how OpenAI's models compared to human physicians. In one experiment, researchers focused on 76 patients who came into the Beth Israel emergency room, comparing the diagnoses offered by two internal medicine attending physicians to those generated by OpenAI's o1 and 4o models. These diagnoses were assessed by two other attending physicians, who did not know which ones came from humans and which came from AI. 'At each diagnostic touchpoint, o1 either performed nominally better than or on par with the two attending physicians and 4o,' the study said, adding that the differences 'were especially pronounced at the first diagnostic touchpoint (initial ER triage), where there is the least information available about the patient and the...
learn more