📋 Model Description


base_model:
  • google/medgemma-1.5-4b-it
license: other license_name: health-ai-developer-foundations license_link: https://developers.google.com/health-ai-developer-foundations/terms library_name: transformers pipeline_tag: image-text-to-text extragatedheading: Access MedGemma on Hugging Face extragatedprompt: >- To access MedGemma on Hugging Face, you're required to review and agree to Health AI Developer Foundation's terms of use. To do this, please ensure you're logged in to Hugging Face and click below. Requests are processed immediately. extragatedbutton_content: Acknowledge license tags:
  • medical
  • unsloth
  • radiology
  • clinical-reasoning
  • dermatology
  • pathology
  • ophthalmology
  • chest-x-ray

Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.

MedGemma 1.5 model card

Note: This card describes MedGemma 1.5, which is only available as a 4B
multimodal instruction-tuned variant. For information on MedGemma 1 variants,
refer to the MedGemma 1 model
card
.

Model documentation: MedGemma

Resources:

Foundations terms of use. MedGemma has not been evaluated or optimized for multi-turn applications.

MedGemma's training may make it more sensitive to the specific prompt used than
Gemma 3.

When adapting MedGemma developer should consider the following:

Foundations terms of use. channels

Author: Google

Model information

This section describes the specifications and recommended use of the MedGemma
model.

Description

MedGemma is a collection of Gemma 3
variants that are trained for performance on medical text and image
comprehension. Developers can use MedGemma to accelerate building
healthcare-based AI applications.

MedGemma 1.5 4B is an updated version of the MedGemma 1 4B model.

MedGemma 1.5 4B expands support for several new medical imaging and data
processing applications, including:

  • High-dimensional medical imaging: Interpretation of three-dimensional
volume representations of Computed Tomography (CT) and Magnetic Resonance Imaging (MRI).
  • Whole-slide histopathology imaging (WSI): Simultaneous interpretation of
multiple patches from a whole slide histopathology image as input.
  • Longitudinal medical imaging: Interpretation of chest X-rays in the
context of prior images (e.g., comparing current versus historical scans).
  • Anatomical localization: Bounding box–based localization of anatomical
features and findings in chest X-rays.
  • Medical document understanding: Extraction of structured data, such as
values and units, from unstructured medical lab reports.
  • Electronic Health Record (EHR) understanding: Interpretation of
text-based EHR data.

In addition to these new features, MedGemma 1.5 4B delivers improved accuracy on
medical text reasoning and modest improvement on standard 2D image
interpretation compared to MedGemma 1 4B.

MedGemma utilizes a SigLIP image encoder
that has been specifically pre-trained on a variety of de-identified medical
data, including chest X-rays, dermatology images, ophthalmology images, and
histopathology slides. The LLM component is trained on a diverse set of medical
data, including medical text, medical question-answer pairs, FHIR-based
electronic health record data, 2D and 3D radiology images, histopathology
images, ophthalmology images, dermatology images, and lab reports for document
understanding.

MedGemma 1.5 4B has been evaluated on a range of clinically relevant benchmarks
to illustrate its baseline performance. These evaluations are based on both open
benchmark datasets and internally curated datasets. Developers are expected to
fine-tune MedGemma for improved performance on their use case. Consult the
Intended use section
for more details.

MedGemma is optimized for medical applications that involve a text generation
component. For medical image-based applications that do not involve text
generation, such as data-efficient classification, zero-shot classification, or
content-based or semantic image retrieval, the MedSigLIP image
encoder

is recommended. MedSigLIP is based on the same image encoder that powers
MedGemma 1 and MedGemma 1.5.

How to use

The following are some example code snippets to help you quickly get started
running the model locally on GPU.

Note: If you need to use the model at scale, we recommend creating a production
version using Model
Garden
.
Model Garden provides various deployment options and tutorial notebooks,
including specialized server-side image processing options for efficiently
handling large medical images: Whole Slide Digital Pathology (WSI) or volumetric
scans (CT/MRI) stored in Cloud DICOM
Store
or
Google Cloud Storage (GCS).

First, install the Transformers library. Gemma 3 is supported starting from
transformers 4.50.0.

$ pip install -U transformers

Next, use either the pipeline wrapper or the transformer API directly to send a
chest X-ray image and a question to the model.

Note that CT, MRI and whole-slide histopathology images require some
pre-processing; see the
CT
and
WSI
notebook for examples.

Run model with the pipeline API

from transformers import pipeline
from PIL import Image
import requests
import torch

pipe = pipeline(
"image-text-to-text",
model="google/medgemma-1.5-4b-it",
torch_dtype=torch.bfloat16,
device="cuda",
)

Image attribution: Stillwaterising, CC0, via Wikimedia Commons

imageurl = "https://upload.wikimedia.org/wikipedia/commons/c/c8/ChestXrayPA3-8-2010.png" image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)

messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Describe this X-ray"}
]
}
]

output = pipe(text=messages, maxnewtokens=2000)
print(output[0]["generated_text"][-1]["content"])

Run the model directly

# Make sure to install the accelerate library first via pip install accelerate
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import requests
import torch

model_id = "google/medgemma-1.5-4b-it"

model = AutoModelForImageTextToText.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.frompretrained(modelid)

Image attribution: Stillwaterising, CC0, via Wikimedia Commons

imageurl = "https://upload.wikimedia.org/wikipedia/commons/c/c8/ChestXrayPA3-8-2010.png" image = Image.open(requests.get(image_url, headers={"User-Agent": "example"}, stream=True).raw)

messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Describe this X-ray"}
]
}
]

inputs = processor.applychattemplate(
messages, addgenerationprompt=True, tokenize=True,
returndict=True, returntensors="pt"
).to(model.device, dtype=torch.bfloat16)

inputlen = inputs["inputids"].shape[-1]

with torch.inference_mode():
generation = model.generate(inputs, maxnewtokens=2000, do_sample=False)
generation = generation[0][input_len:]

decoded = processor.decode(generation, skipspecialtokens=True)
print(decoded)

Examples

Refer to the growing collection of tutorial
notebooks
to see
how to use or fine-tune MedGemma.

Model architecture overview

The MedGemma model is built based on Gemma 3 and
uses the same decoder-only transformer architecture as Gemma 3. To read more
about the architecture, consult the Gemma 3 model
card
.

Technical specifications

  • Model type: Decoder-only Transformer architecture, see the Gemma 3
Technical Report
  • Input modalities: Text, vision (multimodal)
  • Output modality: Text only
  • Attention mechanism: Grouped-query attention (GQA)
  • Context length: Supports long context, at least 128K tokens
  • Key publication: https://arxiv.org/abs/2507.05201
  • Model created: 4B multimodal: Jan 13, 2026
  • Model version: 4B multimodal: 1.5.0

Citation

When using this model, please cite: Sellergren et al. "MedGemma Technical
Report." arXiv preprint arXiv:2507.05201 (2025).

@article{sellergren2025medgemma,
  title={MedGemma Technical Report},
  author={Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and Traverse, Madeleine and Kohlberger, Timo and Xu, Shawn and Jamil, Fayaz and Hughes, Cían and Lau, Charles and others},
  journal={arXiv preprint arXiv:2507.05201},
  year={2025}
}

Inputs and outputs

Input:

  • Text string, such as a question or prompt
  • Images, normalized to 896 x 896 resolution and encoded to 256 tokens each
  • Total input length of 128K tokens

Output:

  • Generated text in response to the input, such as an answer to a question,
analysis of image content, or a summary of a document
  • Total output length of 8192 tokens

Performance and evaluations

MedGemma was evaluated across a range of different multimodal classification,
report generation, visual question answering, and text-based tasks.

Key performance metrics

#### Imaging evaluations

The multimodal performance of MedGemma 1.5 4B was evaluated across a range of
benchmarks, focusing on radiology (2D, longitudinal 2D, and 3D), dermatology,
histopathology, ophthalmology, document understanding, and multimodal clinical
reasoning. See Data card for details of individual datasets.

We also list the previous results for MedGemma 1 4B and 27B (multimodal models
only), as well as for Gemma 3 4B for comparison.

Task / DatasetMetricGemma 3 4BMedGemma 1 4BMedGemma 1.5 4BMedGemma 1 27B
3D radiology image classification
CT Dataset 1\*(7 conditions/abnormalities)Macro accuracy54.558.261.157.8
CT-RATE (validation, 18 conditions/abnormalities )Macro F123.527.0
Macro precision34.534.2
Macro recall34.142.0
MRI Dataset 1\*(10 conditions/abnormalities)Macro accuracy51.151.364.757.4
2D image classification
MIMIC CXR\\Macro F1 (top 5 conditions)81.288.989.590.0
CheXpert CXRMacro F1 (top 5 conditions)32.648.148.249.9
CXR14Macro F1 (3 conditions)32.050.148.445.3
PathMCQA\* (histopathology)Accuracy37.169.870.071.6
WSI-Path\* (whole-slide histopathology)ROUGE2.32.249.44.1
US-DermMCQA\*Accuracy52.571.873.571.7
EyePACS\* (fundus)Accuracy14.464.976.875.3
Disease Progression Classification (Longitudinal)
MS-CXR-TMacro Accuracy59.061.1165.750.1
Visual question answering
SLAKE (radiology)Tokenized F140.272.359.7\\\\70.3
Accuracy (on closed subset)62.087.682.885.9
VQA-RAD\\\* (radiology)Tokenized F133.649.948.146.7
Accuracy (on closed subset)42.169.170.267.1
Region of interest detection
Chest ImaGenome: Anatomy bounding box detectionIntersection over union6.13.138.016.0
Multimodal medical knowledge and reasoning
MedXpertQA (text \+ multimodal questions)Accuracy16.418.820.926.8
\* Internal datasets. CT Dataset 1 and MRI Dataset 1 are described below \-- for evaluation, perfectly balanced samples were drawn per condition. US-DermMCQA is described in Liu et al. (2020, Nature medicine), presented as a 4-way MCQ per example for skin condition classification. PathMCQA is based on multiple datasets, presented as 3-9 way MCQ per example for identification, grading, and subtype for breast, cervical, and prostate cancer. WSI-Path is a dataset of deidentified H\&E WSIs and associated final diagnosis text from original pathology reports, comprising single WSI examples and previously described in Ahmed et al. (2024, arXiv). EyePACS is a dataset of fundus images with classification labels based on 5-level diabetic retinopathy severity (None, Mild, Moderate, Severe, Proliferative). A subset of these datasets are described in more detail in the MedGemma Technical Report.

\\ Based on radiologist adjudicated labels, described in Yang (2024,
arXiv)
Section A.1.1.

\\\* Based on "balanced split," described in Yang (2024,
arXiv)
.

\\\\ While MedGemma 1.5 4B exhibits strong radiology interpretation
capabilities, it was less optimized for the SLAKE Q\&A format compared to
MedGemma 1 4B. Fine-tuning on SLAKE may improve results.

#### Chest X-ray report generation

MedGemma chest X-ray (CXR) report generation performance was evaluated on
MIMIC-CXR using the RadGraph
F1 metric
. We compare MedGemma 1.5 4B against
a fine-tuned version of MedGemma 1 4B, and the MedGemma 1 27B base model.

Task / DatasetMetricMedGemma 1 4B (tuned for CXR)MedGemma 1.5 4BMedGemma 1 27B
Chest X-ray report generation
MIMIC CXR \- RadGraph F130.327.227.0
#### Text evaluations

MedGemma 1.5 4B was evaluated across a range of text-only benchmarks for medical
knowledge and reasoning. Existing results for MedGemma 1 variants and Gemma 3
are shown for comparison.

DatasetGemma 3 4BMedGemma 1 4BMedGemma 1.5 4BMedGemma 1 27B
MedQA (4-op)50.764.469.185.3
MedMCQA45.455.759.870.2
PubMedQA68.473.468.277.2
MMLU Med67.270.069.686.2
MedXpertQA (text only)11.614.216.423.7
AfriMed-QA (25 question test set)48.052.056.072.0
#### Medical record evaluations

EHR understanding and interpretation was evaluated for synthetic longitudinal
text-based EHR data and real-world de-identified discharge summaries via
question-answering benchmark datasets for MedGemma 1.5 4B, MedGemma 1 variants,
and Gemma 3 4B.

DatasetMetricGemma 3 4BMedGemma 1 4BMedGemma 1.5 4BMedGemma 1 27B
EHRQA\*Accuracy70.967.689.690.5
EHRNoteQAAccuracy78.079.480.490.7
\* Internal dataset

#### Document understanding evaluations

Evaluation of converting unstructured medical lab reports documents
(PDFs/images) into structured JSON data.

Task / DatasetMetricGemma 3 4BMedGemma 1 4BMedGemma 1.5 4BMedGemma 1 27B
PDF-to-JSON Lab Test Data Conversion
EHR Dataset 2\* (raw PDF to JSON)Macro F1 (average over per document F1 scores)84.078.091.076.0
Micro F1 (F1 across all extracted data fields)81.075.088.070.0
EHR Dataset 3\* (raw PDF to JSON)Macro F161.050.071.066.0
Micro F161.051.070.069.0
Mendeley Clinical Laboratory Test Reports (PNG image to JSON)Macro F183.085.085.069.0
Micro F178.081.083.068.0
EHR Dataset 4\*Macro F141.025.064.0
Micro F141.033.067.0
\* Internal datasets.

Ethics and safety evaluation

#### Evaluation approach

Our evaluation methods include structured evaluations and internal red-teaming
testing of relevant content policies. Red-teaming was conducted by a number of
different teams, each with different goals and human evaluation metrics. These
models were evaluated against a number of different categories relevant to
ethics and safety, including:

  • Child safety: Evaluation of text-to-text and image-to-text prompts
covering child safety policies, including child sexual abuse and exploitation.
  • Content safety: Evaluation of text-to-text and image-to-text prompts
covering safety policies, including harassment, violence and gore, and hate speech.
  • Representational harms: Evaluation of text-to-text and image-to-text
prompts covering safety policies, including bias, stereotyping, and harmful associations or inaccuracies.
  • General medical harms: Evaluation of text-to-text and image-to-text
prompts covering safety policies, including information quality and potentially harmful responses or inaccuracies.

In addition to development level evaluations, we conduct "assurance evaluations"
which are our "arms-length" internal evaluations for responsibility governance
decision making. They are conducted separately from the model development team
and inform decision making about release. High-level findings are fed back to
the model team but prompt sets are held out to prevent overfitting and preserve
the results' ability to inform decision making. Notable assurance evaluation
results are reported to our Responsibility & Safety Council as part of release
review.

#### Evaluation results

For all areas of safety testing, we saw safe levels of performance across the
categories of child safety, content safety, and representational harms compared
to previous Gemma models. All testing was conducted without safety filters to
evaluate the model capabilities and behaviors. For both text-to-text and
image-to-text the model produced minimal policy violations. A limitation of our
evaluations was that they included primarily English language prompts.

Data card

Dataset overview

#### Training

The base Gemma models are pre-trained on a large corpus of text and code data.
MedGemma multimodal variants utilize a
SigLIP image encoder that has been
specifically pre-trained on a variety of de-identified medical data, including
radiology images, histopathology images, ophthalmology images, and dermatology
images. Their LLM component is trained on a diverse set of medical data,
including medical text, medical question-answer pairs, FHIR-based electronic
health record data (27B multimodal only), radiology images, histopathology
patches, ophthalmology images, and dermatology images.

#### Evaluation

MedGemma models have been evaluated on a comprehensive set of clinically
relevant benchmarks across multiple datasets, tasks and modalities. These
benchmarks include both open and internal datasets.

#### Source

MedGemma utilizes a combination of public and private datasets.

This model was trained on diverse public datasets including MIMIC-CXR (chest
X-rays and reports), ChestImaGenome: Set of bounding boxes linking image
findings with anatomical regions for MIMIC-CXR SLAKE (multimodal medical images
and questions), PAD-UFES-20 (skin lesion images and data), SCIN (dermatology
images), TCGA (cancer genomics data), CAMELYON (lymph node histopathology
images), PMC-OA (biomedical literature with images), and Mendeley Digital Knee
X-Ray (knee X-rays).

Additionally, multiple diverse proprietary datasets were licensed and
incorporated (described next).

Data ownership and documentation

for Computational Physiology and Beth Israel Deaconess Medical Center (BIDMC). Research Health Futures, Microsoft Research. Institutes of Health \- Clinical Center.
  • SLAKE: The Hong Kong Polytechnic
University (PolyU), with collaborators including West China Hospital of Sichuan University and Sichuan Academy of Medical Sciences / Sichuan Provincial People's Hospital. University of Espírito Santo (UFES), Brazil, through its Dermatological and Surgical Assistance Program (PAD).
  • SCIN: A collaboration
between Google Health and Stanford Medicine.
  • TCGA (The Cancer Genome Atlas): A joint
effort of National Cancer Institute and National Human Genome Research Institute. Data from TCGA are available via the Genomic Data Commons (GDC) collected from Radboud University Medical Center and University Medical Center Utrecht in the Netherlands. Subset): Maintained by the National Library of Medicine (NLM) and National Center for Biotechnology Information (NCBI), which are part of the NIH.
  • MedQA: This dataset was created by a
team of researchers led by Di Jin, Eileen Pan, Nassim Oufattole, Wei-Hung Weng, Hanyi Fang, and Peter Szolovits.
  • MedMCQA: This dataset was created by
Ankit Pal, Logesh Kumar Umapathi and Malaikannan Sankarasubbu from Saama AI Research, Chennai, India Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William W. Cohen, Xinghua Lu from the University of Pittsburg, Carnegie Mellon University and Google. dataset was created by Ben Abacha Asma, Eugene Agichtein Yuval Pinter and Dina Demner-Fushman from the U.S. National Library of Medicine, Emory University and Georgia Institute of Technology. X-Ray: This dataset is from Rani Channamma University, and is hosted on Mendeley Data. multiple collaborating organizations and researchers include key contributors: Intron Health, SisonkeBiotik, BioRAMP, Georgia Institute of Technology, and MasakhaneNLP. created by a research team led by Jason J. Lau, Soumya Gayen, Asma Ben Abacha, and Dina Demner-Fushman and their affiliated institutions (the US National Library of Medicine and National Institutes of Health) Research. This dataset was created by researchers at the HiTZ Center (Basque Center for Language Technology and Artificial Intelligence). dataset was developed by researchers at Tsinghua University (Beijing, China) and Shanghai Artificial Intelligence Laboratory (Shanghai, China). This dataset consists of consisting of 3,173 commonly searched consumer questions.
  • ISIC: International Skin Imaging
Collaboration is a joint effort involving clinicians, researchers, and engineers from various institutions worldwide. Reports: This dataset is hosted on Mendeley and includes 260 clinical laboratory test reports issued by 24 laboratories in Egypt. Medipol University Mega Hospital and University of Zurich / ETH Zurich.

In addition to the public datasets listed above, MedGemma was also trained on
de-identified, licensed datasets or datasets collected internally at Google from
consented participants.

  • CT dataset 1: De-identified dataset of different axial CT studies across
body parts (head, chest, abdomen) from a US-based radiology outpatient diagnostic center network.
  • MRI dataset 1: De-identified dataset of different axial multi-parametric
MRI studies across body parts (head, abdomen, knee) from a US-based radiology outpatient diagnostic center network
  • Ophthalmology dataset 1 (EyePACS): De-identified dataset of fundus
images from diabetic retinopathy screening.
  • Dermatology dataset 1: De-identified dataset of teledermatology skin
condition images (both clinical and dermatoscopic) from Colombia.
  • Dermatology dataset 2: De-identified dataset of skin cancer images (both
clinical and dermatoscopic) from Australia.
  • Dermatology dataset 3: De-identified dataset of non-diseased skin images
from an internal data collection effort.
  • Dermatology dataset 4: De-identified dataset featuring multiple images
and longitudinal visits and records from Japan.
  • Dermatology dataset 5: Dermatology dataset featuring unlabeled images.
  • Dermatology dataset 6: De-identified cases from adult patients with data
representing Fitzpatrick 5 or 6 skin types
  • Pathology dataset 1: De-identified dataset of histopathology H\&E whole
slide images created in collaboration with an academic research hospital and biobank in Europe. Comprises de-identified colon, prostate, and lymph nodes.
  • Pathology dataset 2: De-identified dataset of lung histopathology H\&E
and IHC whole slide images created by a commercial biobank in the United States.
  • Pathology dataset 3: De-identified dataset of prostate and lymph node
H\&E and IHC histopathology whole slide images created by a contract research organization in the United States.
  • Pathology dataset 4: De-identified dataset of histopathology whole slide
images created in collaboration with a large, tertiary teaching hospital in the United States. Comprises a diverse set of tissue and stain types, predominantly H\&E.
  • EHR dataset 1: Question/answer dataset drawn from synthetic FHIR records
created by Synthea. The test set includes 19 unique patients with 200 questions per patient divided into 10 different categories.
  • EHR dataset 2: De-identified Lab Reports across different departments in
Pathology such as Biochemistry, Clinical Pathology, Hematology, Microbiology and Serology
  • EHR dataset 3: De-identified Lab Reports across different departments in
Pathology such as Biochemistry, Clinical Pathology, Hematology, Microbiology and Serology from at least 25 different labs
  • EHR dataset 4: Synthetic dataset of laboratory reports
  • EHR dataset 5: Synthetic dataset of approximately 60,000 health-relevant
user queries

Data citation

  • MIMIC-CXR: Johnson, A., Pollard, T., Mark, R., Berkowitz, S., & Horng,
S. (2024). MIMIC-CXR Database (version 2.1.0). PhysioNet. https://physionet.org/content/mimic-cxr/2.1.0/ and Johnson, Alistair E. W., Tom J. Pollard, Seth J. Berkowitz, Nathaniel R. Greenbaum, Matthew P. Lungren, Chih-Ying Deng, Roger G. Mark, and Steven Horng. 2019\. "MIMIC-CXR, a de-Identified Publicly Available Database of Chest Radiographs with Free-Text Reports." Scientific Data 6 (1): 1–8.
  • MS-CXR-T: Bannur, S., Hyland, S., Liu, Q., Pérez-García, F., Ilse, M.,
Coelho de Castro, D., Boecking, B., Sharma, H., Bouzid, K., Schwaighofer, A., Wetscherek, M. T., Richardson, H., Naumann, T., Alvarez Valle, J., & Oktay, O. (2023). MS-CXR-T: Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing (version 1.0.0). PhysioNet. https://doi.org/10.13026/pg10-j984.
  • ChestX-ray14: Wang, Xiaosong, Yifan Peng, Le Lu, Zhiyong Lu,
Mohammadhadi Bagheri, and Ronald M. Summers. "Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases." In *Proceedings of the IEEE conference on computer vision and pattern recognition*, pp. 2097-2106. 2017\.
  • SLAKE: Liu, Bo, Li-Ming Zhan, Li Xu, Lin Ma, Yan Yang, and Xiao-Ming Wu.
2021.SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering." http://arxiv.org/abs/2102.09542.
  • PAD-UFES-20: Pacheco, Andre GC, et al. "PAD-UFES-20: A skin lesion
dataset composed of patient data and clinical images collected from smartphones." Data in brief 32 (2020): 106221\.
  • SCIN: Ward, Abbi, Jimmy Li, Julie Wang, Sriram Lakshminarasimhan, Ashley
Carrick, Bilson Campana, Jay Hartford, et al. 2024\. "Creating an Empirical Dermatology Dataset Through Crowdsourcing With Web Search Advertisements." JAMA Network Open 7 (11): e2446615–e2446615.
  • TCGA: The results shown here are in whole or part based upon data
generated by the TCGA Research Network: https://www.cancer.gov/tcga.
  • CAMELYON16: Ehteshami Bejnordi, Babak, Mitko Veta, Paul Johannes van
Diest, Bram van Ginneken, Nico Karssemeijer, Geert Litjens, Jeroen A. W. M. van der Laak, et al. 2017\. "Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer." JAMA 318 (22): 2199–2210.
  • CAMELYON17: Bandi, Peter, et al. "From detection of individual
metastases to classification of lymph node status at the patient level: the camelyon17 challenge." IEEE transactions on medical imaging 38.2 (2018): 550-560.
  • Mendeley Digital Knee X-Ray: Gornale, Shivanand; Patravali, Pooja
(2020), "Digital Knee X-ray Images", Mendeley Data, V1, doi: 10.17632/t9ndx37v5h.1
  • VQA-RAD: Lau, Jason J., Soumya Gayen, Asma Ben Abacha, and Dina
Demner-Fushman. 2018\. "A Dataset of Clinically Generated Visual Questions and Answers about Radiology Images." Scientific Data 5 (1): 1–10.
  • Chest ImaGenome: Wu, J., Agu, N., Lourentzou, I., Sharma, A., Paguio,
J., Yao, J. S., Dee, E. C., Mitchell, W., Kashyap, S., Giovannini, A., Celi, L. A., Syeda-Mahmood, T., & Moradi, M. (2021). Chest ImaGenome Dataset (version 1.0.0). PhysioNet. RRID:SCR\_007345. https://doi.org/10.13026/wv01-y230
  • MedQA: Jin, Di, Eileen Pan, Nassim Oufattole, Wei-Hung Weng, Hanyi Fang,
and Peter Szolovits. 2020\. "What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams." http://arxiv.org/abs/2009.13081.
  • MedMCQA: Pal, Ankit, Logesh Kumar Umapathi, and Malaikannan
Sankarasubbu. "Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering." *Conference on health, inference, and learning. PMLR,* 2022\.
  • PubMedQA: Jin, Qiao, et al. "Pubmedqa: A dataset for biomedical research
question answering." *Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP).* 2019\.
  • LiveQA: Abacha, Asma Ben, et al. "Overview of the medical question
answering task at TREC 2017 LiveQA." TREC. 2017\.
  • AfriMed-QA: Olatunji, Tobi, Charles Nimo, Abraham Owodunni, Tassallah
Abdullahi, Emmanuel Ayodele, Mardhiyah Sanni, Chinemelu Aka, et al. 2024\. "AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset." http://arxiv.org/abs/2411.15640.
  • MedExpQA: Alonso, I., Oronoz, M., & Agerri, R. (2024). MedExpQA:
Multilingual Benchmarking of Large Language Models for Medical Question Answering. arXiv preprint arXiv:2404.05590. Retrieved from https://arxiv.org/abs/2404.05590
  • MedXpertQA: Zuo, Yuxin, Shang Qu, Yifei Li, Zhangren Chen, Xuekai Zhu,
Ermo Hua, Kaiyan Zhang, Ning Ding, and Bowen Zhou. 2025\. "MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding." http://arxiv.org/abs/2501.18362.
  • HealthSearchQA: Singhal, Karan, Shekoofeh Azizi, Tao Tu, S. Sara
Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales et al. "Large language models encode clinical knowledge." Nature 620, no. 7972 (2023): 172-180.
  • ISIC: Gutman, David; Codella, Noel C. F.; Celebi, Emre; Helba, Brian;
Marchetti, Michael; Mishra, Nabin; Halpern, Allan. "Skin Lesion Analysis toward Melanoma Detection: A Challenge at the International Symposium on Biomedical Imaging (ISBI) 2016, hosted by the International Skin Imaging Collaboration (ISIC)". eprint arXiv:1605.01397. 2016
  • Mendeley Clinical Laboratory Test Reports: Abdelmaksoud, Esraa;
Gadallah, Ahmed; Asad, Ahmed (2022), “Clinical Laboratory Test Reports”, Mendeley Data, V2, doi: 10.17632/bygfmk4rx9.2
  • CheXpert: Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S.,
Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpanskaya, K., Seekins, J., Mong, D. A., Halabi, S. S., Sandberg, J. K., Jones, R., Larson, D. B., Langlotz, C. P., Patel, B. N., Lungren, M. P., & Ng, A. Y. (2019). CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. arXiv:1901.07031
  • CT-RATE: Hamamci, I. E., Er, S., Almas, F., Simsek, A. G., Esirgun, S.
N., Dogan, I., Dasdelen, M. F., Wittmann, B., Menze, B., et al. (2024). CT-RATE Dataset. Hugging Face. https://huggingface.co/datasets/ibrahimhamamci/CT-RATE and Hamamci, Ibrahim Ethem, Sezgin Er, Furkan Almas, Ayse Gulnihan Simsek, Sevval Nil Esirgun, Irem Dogan, Muhammed Furkan Dasdelen, Bastian Wittmann, et al. 2024\. "Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography." arXiv preprint arXiv:2403.17834. https://arxiv.org/abs/2403.17834
  • EHRNoteQA: Sunjun Kweon, Jiyoun Kim, Heeyoung Kwak, Dongchul Cha,
Hangyul Yoon, Kwanghyun Kim, Jeewon Yang, Seunghyun Won, Edward Choi. (2024) “EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries.” arXiv:2402.16040

De-identification/anonymization:

Google and its partners utilize datasets that have been rigorously anonymized or
de-identified to ensure the protection of individual research participants and
patient privacy.

Implementation information

Details about the model internals.

Software

Training was done using JAX.

JAX allows researchers to take advantage of the latest generation of hardware,
including TPUs, for faster and more efficient training of large models.

Use and limitations

Intended use

MedGemma is an open multimodal generative AI model intended to be used as a
starting point that enables more efficient development of downstream healthcare
applications involving medical text and images. MedGemma is intended for
developers in the life sciences and healthcare space. Developers are responsible
for training, adapting, and making meaningful changes to MedGemma to accomplish
their specific intended use. MedGemma models can be fine-tuned by developers
using their own proprietary data for their specific tasks or solutions.

MedGemma is based on Gemma 3 and has been further trained on medical images and
text. MedGemma enables further development in medical contexts (image and
textual); however, the model has been trained using chest x-ray, histopathology,
dermatology, fundus images, CT, MR, medical text/documents and electronic health
records (EHR) data. Examples of tasks within MedGemma’s training include visual
question answering pertaining to medical images, such as radiographs, document
understanding, or providing answers to textual medical questions.

Benefits

  • Provides strong baseline medical image and text comprehension for models of
its size.
  • This strong performance makes it efficient to adapt for downstream
healthcare-based use cases, compared to models of similar size without medical data pre-training.
  • This adaptation may involve prompt engineering, grounding, agentic
orchestration or fine-tuning depending on the use case, baseline validation requirements, and desired performance characteristics.

Limitations

MedGemma is not intended to be used without appropriate validation, adaptation,
and/or making meaningful modification by developers for their specific use case.
The outputs generated by MedGemma are not intended to directly inform clinical
diagnosis, patient management decisions, treatment recommendations, or any other
direct clinical practice applications. All outputs from MedGemma should be
considered preliminary and require independent verification, clinical
correlation, and further investigation through established research and
development methodologies.

MedGemma's multimodal capabilities have been primarily evaluated on single-image
tasks. MedGemma has not been evaluated in use cases that involve comprehension
of multiple images.

MedGemma has not been evaluated or optimized for multi-turn applications.

MedGemma's training may make it more sensitive to the specific prompt used than
Gemma 3.

When adapting MedGemma developer should consider the following:

  • Bias in validation data: As with any research, developers should ensure
that any downstream application is validated to understand performance using data that is appropriately representative of the intended use setting for the specific application (e.g., age, sex, gender, condition, imaging device, etc).
  • Data contamination concerns: When evaluating the generalization
capabilities of a large model like MedGemma in a medical context, there is a risk of data contamination, where the model might have inadvertently seen related medical information during its pre-training, potentially overestimating its true ability to generalize to novel medical concepts. Developers should validate MedGemma on datasets not publicly available or otherwise made available to non-institutional researchers to mitigate this risk.

Release notes

#### MedGemma 4B IT

  • May 20, 2025: Initial release
  • July 9, 2025 Bug fix: Fixed the subtle degradation in the multimodal
performance. The issue was due to a missing end-of-image token in the model vocabulary, impacting combined text-and-image tasks. This fix reinstates and correctly maps that token, ensuring text-only tasks remain unaffected while restoring multimodal performance.
  • Jan 13, 2026: Updated to version 1.5 with improved medical reasoning,
medical records interpretation and medical image interpretation

📂 GGUF File List

📁 Filename 📦 Size ⚡ Download
medgemma-1.5-4b-it-BF16.gguf
LFS FP16
7.23 GB Download
medgemma-1.5-4b-it-IQ4_NL.gguf
LFS Q4
2.2 GB Download
medgemma-1.5-4b-it-IQ4_XS.gguf
LFS Q4
2.11 GB Download
medgemma-1.5-4b-it-Q2_K.gguf
LFS Q2
1.61 GB Download
medgemma-1.5-4b-it-Q2_K_L.gguf
LFS Q2
1.61 GB Download
medgemma-1.5-4b-it-Q3_K_M.gguf
LFS Q3
1.95 GB Download
medgemma-1.5-4b-it-Q3_K_S.gguf
LFS Q3
1.8 GB Download
medgemma-1.5-4b-it-Q4_0.gguf
Recommended LFS Q4
2.21 GB Download
medgemma-1.5-4b-it-Q4_1.gguf
LFS Q4
2.39 GB Download
medgemma-1.5-4b-it-Q4_K_M.gguf
LFS Q4
2.32 GB Download
medgemma-1.5-4b-it-Q4_K_S.gguf
LFS Q4
2.21 GB Download
medgemma-1.5-4b-it-Q5_K_M.gguf
LFS Q5
2.64 GB Download
medgemma-1.5-4b-it-Q5_K_S.gguf
LFS Q5
2.57 GB Download
medgemma-1.5-4b-it-Q6_K.gguf
LFS Q6
2.97 GB Download
medgemma-1.5-4b-it-Q8_0.gguf
LFS Q8
3.85 GB Download
medgemma-1.5-4b-it-UD-IQ1_M.gguf
LFS
1.16 GB Download
medgemma-1.5-4b-it-UD-IQ1_S.gguf
LFS
1.1 GB Download
medgemma-1.5-4b-it-UD-IQ2_M.gguf
LFS Q2
1.46 GB Download
medgemma-1.5-4b-it-UD-IQ2_XXS.gguf
LFS Q2
1.25 GB Download
medgemma-1.5-4b-it-UD-IQ3_XXS.gguf
LFS Q3
1.59 GB Download
medgemma-1.5-4b-it-UD-Q2_K_XL.gguf
LFS Q2
1.65 GB Download
medgemma-1.5-4b-it-UD-Q3_K_XL.gguf
LFS Q3
2 GB Download
medgemma-1.5-4b-it-UD-Q4_K_XL.gguf
LFS Q4
2.36 GB Download
medgemma-1.5-4b-it-UD-Q5_K_XL.gguf
LFS Q5
2.64 GB Download
medgemma-1.5-4b-it-UD-Q6_K_XL.gguf
LFS Q6
3.32 GB Download
medgemma-1.5-4b-it-UD-Q8_K_XL.gguf
LFS Q8
4.81 GB Download
mmproj-BF16.gguf
LFS FP16
811.82 MB Download
mmproj-F16.gguf
LFS FP16
811.82 MB Download
mmproj-F32.gguf
LFS
1.56 GB Download