Google’s Chatbot Can Now Diagnose from Photos

In tests conducted on 963 teledermatology cases involving 26 common skin conditions, the AI system achieved a top-1 diagnostic accuracy of 66%.
A groundbreaking study from Google Health has shown that its deep-learning model can identify skin rashes in clinical photos more accurately than many human clinicians.
Articulate Medical Intelligence Explorer (AMIE), a new version of Google’s experimental medical chatbot, has shown improvements in interpreting medical images, including smartphone photos of rashes, electrocardiograms (ECGs), and lab report PDFs, bringing it closer to functioning like a real-world clinical assistant.
In tests conducted on 963 teledermatology cases involving 26 common skin conditions, the AI system achieved a top-1 diagnostic accuracy of 66%. This slightly surpassed board-certified dermatologists, who scored 63%, and significantly outperformed primary-care physicians (44%) and nurse practitioners (40%).
The results suggest AI could be crucial in reducing diagnostic errors for the two billion globally who suffer from skin disorders each year.
A paper published on the arXiv preprint server detailed the chatbot, which is still under development and not yet available for clinical use.
Though not yet peer-reviewed, in previous evaluations, the system has drawn attention for outperforming human doctors in diagnostic accuracy and bedside communication.
The latest version builds on Gemini 2.0 Flash, Google’s image-capable large language model (LLM), and has been specially adapted for healthcare applications. Researchers added an algorithm to enhance the model’s ability to carry out diagnostic conversations and reason through clinical data.
To assess its capabilities, 25 trained actors participated in simulated consultations with AMIE and human physicians, covering 105 clinical scenarios with varying symptoms, histories, and relevant medical images. After each session, both the bot and the doctor provided diagnoses and treatment plans.
A group of 18 medical specialists across dermatology, cardiology, and internal medicine reviewed transcripts and case summaries.
They found that AMIE outperformed human doctors in diagnostic accuracy, and its performance remained robust even when image quality was poor.
“This way, you can sort of imbue it with the right, desirable behaviours when conducting a diagnostic conversation,” said Ryutaro Tanno, scientist at Google DeepMind and co-author of the paper.
However, the study also noted a concerning trend: the accuracy gap between light and dark skin tones widened when non-specialists used AI assistance. To address this, Google has committed to collecting more images of darker skin and retraining the model to minimise bias.
Experts caution that such tools cannot detect aspects like texture, temperature, or broader systemic signs and must therefore be considered complementary to clinical judgment.
Stay tuned for more such updates on Digital Health News.
Stay tuned for more such updates on Digital Health News