unsloth/GLM-4.1V-9B-Thinking-GGUF

Name: unsloth/GLM-4.1V-9B-Thinking-GGUF
Author: unsloth

High-quality GGUF model

2.7K 📥 Downloads

39 ❤️ Likes

26 📁 GGUF Files

153.18 GB 💾 Total Size

5 months ago 🔄 Last Updated

📋 Model Description

license: mit language:

base_model:

THUDM/GLM-4.1V-9B-Thinking

pipeline_tag: image-text-to-text library_name: transformers tags:

reasoning
unsloth

[!NOTE]
Includes Unsloth chat template fixes!
For llama.cpp, use --jinja

Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.

GLM-4.1V-9B-Thinking

📖 View the GLM-4.1V-9B-Thinking paper.

💡 Try the Hugging Face or ModelScope online demo for GLM-4.1V-9B-Thinking.

📍 Using GLM-4.1V-9B-Thinking API at Zhipu Foundation Model Open Platform

Model Introduction

Vision-Language Models (VLMs) have become foundational components of intelligent systems. As real-world AI tasks grow
increasingly complex, VLMs must evolve beyond basic multimodal perception to enhance their reasoning capabilities in
complex tasks. This involves improving accuracy, comprehensiveness, and intelligence, enabling applications such as
complex problem solving, long-context understanding, and multimodal agents.

Based on the GLM-4-9B-0414 foundation model, we present the new open-source VLM model
GLM-4.1V-9B-Thinking, designed to explore the upper limits of reasoning in vision-language models. By introducing
a "thinking paradigm" and leveraging reinforcement learning, the model significantly enhances its capabilities. It
achieves state-of-the-art performance among 10B-parameter VLMs, matching or even surpassing the 72B-parameter
Qwen-2.5-VL-72B on 18 benchmark tasks. We are also open-sourcing the base model GLM-4.1V-9B-Base to
support further research into the boundaries of VLM capabilities.

!rl

Compared to the previous generation models CogVLM2 and the GLM-4V series, GLM-4.1V-Thinking offers the
following improvements:

The first reasoning-focused model in the series, achieving world-leading performance not only in mathematics but also

across various sub-domains.

Supports 64k context length.
Handles arbitrary aspect ratios and up to 4K image resolution.
Provides an open-source version supporting both Chinese and English bilingual usage.

Benchmark Performance

By incorporating the Chain-of-Thought reasoning paradigm, GLM-4.1V-9B-Thinking significantly improves answer accuracy,
richness, and interpretability. It comprehensively surpasses traditional non-reasoning visual models.
Out of 28 benchmark tasks, it achieved the best performance among 10B-level models on 23 tasks,
and even outperformed the 72B-parameter Qwen-2.5-VL-72B on 18 tasks.

!bench

Quick Inference

This is a simple example of running single-image inference using the transformers library.
First, install the transformers library from source:

pip install git+https://github.com/huggingface/transformers.git

Then, run the following code:

from transformers import AutoProcessor, Glm4vForConditionalGeneration
import torch

MODEL_PATH = "THUDM/GLM-4.1V-9B-Thinking"
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "url": "https://upload.wikimedia.org/wikipedia/commons/f/fa/Grayscale8bitspalettesampleimage.png"
            },
            {
                "type": "text",
                "text": "describe this image"
            }
        ],
    }
]
processor = AutoProcessor.frompretrained(MODELPATH, use_fast=True)
model = Glm4vForConditionalGeneration.from_pretrained(
    pretrainedmodelnameorpath=MODEL_PATH,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
inputs = processor.applychattemplate(
    messages,
    tokenize=True,
    addgenerationprompt=True,
    return_dict=True,
    return_tensors="pt"
).to(model.device)
generatedids = model.generate(inputs, maxnew_tokens=8192)
outputtext = processor.decode(generatedids[0][inputs["inputids"].shape[1]:], skipspecial_tokens=False)
print(output_text)

For video reasoning, web demo deployment, and more code, please check
our GitHub.

📂 GGUF File List

📁 Filename	📦 Size	⚡ Download
GLM-4.1V-9B-Thinking-BF16.gguf LFS FP16	17.52 GB	Download
GLM-4.1V-9B-Thinking-IQ4_NL.gguf LFS Q4	5.09 GB	Download
GLM-4.1V-9B-Thinking-IQ4_XS.gguf LFS Q4	4.92 GB	Download
GLM-4.1V-9B-Thinking-Q2_K.gguf LFS Q2	3.73 GB	Download
GLM-4.1V-9B-Thinking-Q2_K_L.gguf LFS Q2	3.87 GB	Download
GLM-4.1V-9B-Thinking-Q3_K_M.gguf LFS Q3	4.63 GB	Download
GLM-4.1V-9B-Thinking-Q3_K_S.gguf LFS Q3	4.28 GB	Download
GLM-4.1V-9B-Thinking-Q4_0.gguf Recommended LFS Q4	5.1 GB	Download
GLM-4.1V-9B-Thinking-Q4_1.gguf LFS Q4	5.6 GB	Download
GLM-4.1V-9B-Thinking-Q4_K_M.gguf LFS Q4	5.74 GB	Download
GLM-4.1V-9B-Thinking-Q4_K_S.gguf LFS Q4	5.36 GB	Download
GLM-4.1V-9B-Thinking-Q5_K_M.gguf LFS Q5	6.57 GB	Download
GLM-4.1V-9B-Thinking-Q5_K_S.gguf LFS Q5	6.24 GB	Download
GLM-4.1V-9B-Thinking-Q6_K.gguf LFS Q6	7.7 GB	Download
GLM-4.1V-9B-Thinking-Q8_0.gguf LFS Q8	9.31 GB	Download
GLM-4.1V-9B-Thinking-UD-IQ1_M.gguf LFS	3.09 GB	Download
GLM-4.1V-9B-Thinking-UD-IQ1_S.gguf LFS	2.98 GB	Download
GLM-4.1V-9B-Thinking-UD-IQ2_M.gguf LFS Q2	3.72 GB	Download
GLM-4.1V-9B-Thinking-UD-IQ2_XXS.gguf LFS Q2	3.27 GB	Download
GLM-4.1V-9B-Thinking-UD-IQ3_XXS.gguf LFS Q3	3.94 GB	Download
GLM-4.1V-9B-Thinking-UD-Q2_K_XL.gguf LFS Q2	3.92 GB	Download
GLM-4.1V-9B-Thinking-UD-Q3_K_XL.gguf LFS Q3	4.79 GB	Download
GLM-4.1V-9B-Thinking-UD-Q4_K_XL.gguf LFS Q4	5.76 GB	Download
GLM-4.1V-9B-Thinking-UD-Q5_K_XL.gguf LFS Q5	6.53 GB	Download
GLM-4.1V-9B-Thinking-UD-Q6_K_XL.gguf LFS Q6	8.28 GB	Download
GLM-4.1V-9B-Thinking-UD-Q8_K_XL.gguf LFS Q8	11.25 GB	Download

📊 Model Information

🆔 Model ID: unsloth/GLM-4.1V-9B-Thinking-GGUF

📅 Created: 5 months ago

🔄 Last Updated: 5 months ago

📥 Downloads: 2.7K

❤️ Likes: 39

🎯 Difficulty: Advanced

⚙️ Quantization: FP16, Q4, Q2, Q3, Q5, Q6, Q8

🏷️ Tags

transformersggufreasoningunslothimage-text-to-textenzharxiv:2507.01006base_model:zai-org/GLM-4.1V-9B-Thinkingbase_model:quantized:zai-org/GLM-4.1V-9B-Thinkinglicense:mitendpoints_compatibleregion:usconversational

🔗 Related Links

🤗 Visit HuggingFace ⚡ Quick Download