π Model Description
tags:
- unsloth
- XiaomiMiMo/MiMo-VL-7B-RL
Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.

βββββββββββββββββββββββββββββββββββββββββ
MiMo-VL Technical Report
βββββββββββββββββββββββββββββββββββββββββ
I. Introduction
In this report, we share our efforts to build a compact yet powerful VLM, MiMo-VL-7B. MiMo-VL-7B comprises (1) a native resolution ViT encoder that preserves fine-grained visual details, (2) an MLP projector for efficient cross-modal alignment, and (3) our MiMo-7B language model, specifically optimized for complex reasoning tasks.
The development of MiMo-VL-7B involves two sequential training processes: (1) A four-stage pre-training phase, which includes projector warmup, vision-language alignment, general multi-modal pre-training, and long-context Supervised Fine-Tuning (SFT). This phase yields the MiMo-VL-7B-SFT model. (2) A subsequent post-training phase, where we introduce Mixed On-policy Reinforcement Learning (MORL), a novel framework that seamlessly integrates diverse reward signals spanning perception accuracy, visual grounding precision, logical reasoning capabilities, and human/AI preferences. This phase yields the MiMo-VL-7B-RL model.

We open-source MiMo-VL-7B series, including checkpoints of the SFT and RL model.
We believe this report along with the models will provide valuable insights to develop powerful reasoning VLMs that benefit the larger community.
π€οΈ During this journey, we find
- Incorporating high-quality, broad-coverage reasoning data from the pre-training stage is crucial for enhancing model performance
- Mixed On-policy Reinforcement Learning further enhances model performance, while achieving stable simultaneous improvements remains challenging
II. Model Details

Models are available at Huggingface Collections: MiMo-VL and ModelScope Collections: MiMo-VL
| Model | Description | Download (HuggingFace) | Download (ModelScope) |
|---|---|---|---|
| MiMo-VL-7B-SFT | VLM with extraordinary reasoning potential after 4-stage pre-training | π€ XiaomiMiMo/MiMo-VL-7B-SFT | π€οΈ XiaomiMiMo/MiMo-VL-7B-SFT |
| MiMo-VL-7B-RL | RL model leapfrogging existing open-source models | π€ XiaomiMiMo/MiMo-VL-7B-RL | π€οΈ XiaomiMiMo/MiMo-VL-7B-RL |
III. Evaluation Results
General Capabilities
In general visual-language understanding, MiMo-VL-7B models achieve state-of-the-art open-source results.

Reasoning Tasks
In multi-modal reasoning, both the SFT and RL models significantly outperform all compared open-source baselines across these benchmarks.

[!IMPORTANT]
Results marked with \* are obtained using our evaluation framework.
Tasks with ${\dagger}$ are evaluated by GPT-4o.
GUI Tasks
MiMo-VL-7B-RL possess exceptional GUI understanding and grounding capabilities. As a general-purpose VL model, MiMo-VL achieves comparable or even superior performance to GUI-specialized models.

Elo Rating
With our in-house evaluation dataset and GPT-4o judgments, MiMo-VL-7B-RL achieves the highest Elo rating among all evaluated open-source vision-language models, ranking first across models spanning from 7B to 72B parameters.

IV. Deployment
The MiMo-VL-7B series maintain full compatibility with the Qwen25VLForConditionalGeneration architecture for deployment and inference.
V. Citation
@misc{coreteam2025mimovl,
title={MiMo-VL Technical Report},
author={{Xiaomi LLM-Core Team}},
year={2025},
url={https://github.com/XiaomiMiMo/MiMo-VL},
}
VI. Contact
Please contact us at [email protected] or open an issue if you have any questions.
π GGUF File List
| π Filename | π¦ Size | β‘ Download |
|---|---|---|
|
MiMo-VL-7B-RL-BF16.gguf
LFS
FP16
|
14.2 GB | Download |
|
MiMo-VL-7B-RL-IQ4_NL.gguf
LFS
Q4
|
4.17 GB | Download |
|
MiMo-VL-7B-RL-IQ4_XS.gguf
LFS
Q4
|
3.99 GB | Download |
|
MiMo-VL-7B-RL-Q2_K.gguf
LFS
Q2
|
2.87 GB | Download |
|
MiMo-VL-7B-RL-Q2_K_L.gguf
LFS
Q2
|
3 GB | Download |
|
MiMo-VL-7B-RL-Q3_K_M.gguf
LFS
Q3
|
3.59 GB | Download |
|
MiMo-VL-7B-RL-Q3_K_S.gguf
LFS
Q3
|
3.28 GB | Download |
|
MiMo-VL-7B-RL-Q4_0.gguf
Recommended
LFS
Q4
|
4.16 GB | Download |
|
MiMo-VL-7B-RL-Q4_1.gguf
LFS
Q4
|
4.56 GB | Download |
|
MiMo-VL-7B-RL-Q4_K_M.gguf
LFS
Q4
|
4.36 GB | Download |
|
MiMo-VL-7B-RL-Q4_K_S.gguf
LFS
Q4
|
4.17 GB | Download |
|
MiMo-VL-7B-RL-Q5_K_M.gguf
LFS
Q5
|
5.07 GB | Download |
|
MiMo-VL-7B-RL-Q5_K_S.gguf
LFS
Q5
|
4.96 GB | Download |
|
MiMo-VL-7B-RL-Q6_K.gguf
LFS
Q6
|
5.83 GB | Download |
|
MiMo-VL-7B-RL-Q8_0.gguf
LFS
Q8
|
7.55 GB | Download |
|
MiMo-VL-7B-RL-UD-IQ1_M.gguf
LFS
|
2.1 GB | Download |
|
MiMo-VL-7B-RL-UD-IQ1_S.gguf
LFS
|
1.99 GB | Download |
|
MiMo-VL-7B-RL-UD-IQ2_M.gguf
LFS
Q2
|
2.71 GB | Download |
|
MiMo-VL-7B-RL-UD-IQ2_XXS.gguf
LFS
Q2
|
2.28 GB | Download |
|
MiMo-VL-7B-RL-UD-IQ3_XXS.gguf
LFS
Q3
|
2.97 GB | Download |
|
MiMo-VL-7B-RL-UD-Q2_K_XL.gguf
LFS
Q2
|
3.06 GB | Download |
|
MiMo-VL-7B-RL-UD-Q3_K_XL.gguf
LFS
Q3
|
3.75 GB | Download |
|
MiMo-VL-7B-RL-UD-Q4_K_XL.gguf
LFS
Q4
|
4.43 GB | Download |
|
MiMo-VL-7B-RL-UD-Q5_K_XL.gguf
LFS
Q5
|
5.09 GB | Download |
|
MiMo-VL-7B-RL-UD-Q6_K_XL.gguf
LFS
Q6
|
6.52 GB | Download |
|
MiMo-VL-7B-RL-UD-Q8_K_XL.gguf
LFS
Q8
|
9.45 GB | Download |
|
mmproj-BF16.gguf
LFS
FP16
|
1.27 GB | Download |
|
mmproj-F16.gguf
LFS
FP16
|
1.27 GB | Download |
|
mmproj-F32.gguf
LFS
|
2.54 GB | Download |