ChatGPT slightly outperformed Gemini when tasked with completing radiology training exams; however, both large language models struggled with accuracy during image interpretation, according to a study published March 20 in Cureus.
Researchers from Tampa, Fla.-based USF Health Morsani College of Medicine analyzed how OpenAI’s ChatGPT-4o and Google DeepMind’s Gemini Advanced performed on the 2022 American College of Radiology’s Diagnostic Radiology In-Training exam.
Here are five things to know from their findings:
- The DXIT exam required analysis of both written and image-based content spanning several radiological subspecialties. The exam was administered to the LLMs as 106 multiple-choice questions.
- Though ChatGPT-4o exhibited a higher overall accuracy of 69.8%, compared to Gemini Advanced’s 60.4%, the difference was “not statistically significant,” researchers said.
ChatGPT-4o exhibited increased accuracy in the cardiac and nuclear radiology subspecialties. - Both ChatGPT-4o and Gemini Advanced showed similar accuracy on text-based questions, at 88.1% and 85.7%, respectively.
- ChatGPT-4o’s accuracy on image-based questions was 57.8%, higher than Gemini Advanced’s 43.8%.
- Both models exhibited better accuracy on the written questions, highlighting the need for further training on image interpretation, the study authors said.
Read the full study here.