πŸ“‹ Model Description


library_name: vllm language:
  • en
  • fr
  • es
  • de
  • it
  • pt
  • nl
  • zh
  • ja
  • ko
  • ar
license: apache-2.0 inference: false base_model:
  • mistralai/Ministral-3-8B-Instruct-2512
tags:
  • mistral-common
  • mistral
  • unsloth

See our Ministral 3 collection for all versions including GGUF, 4-bit & FP8 formats.

Learn to run Ministral correctly - Read our Guide.

See Unsloth Dynamic 2.0 GGUFs for our quantization benchmarks.

✨ Read our Ministral 3 Guide here!


Ministral 3 8B Instruct 2512

A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.

The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 8B can even be deployed locally, capable of fitting in 12GB of VRAM in FP8, and less if further quantized.

Key Features

Ministral 3 8B consists of two main architectural components:
  • 8.4B Language Model
  • 0.4B Vision Encoder

The Ministral 3 8B Instruct model offers the following capabilities:

  • Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
  • Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
  • System Prompt: Maintains strong adherence and support for system prompts.
  • Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
  • Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
  • Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
  • Large Context Window: Supports a 256k context window.

Use Cases

Perfect for balanced performance in local or embedded systems, combining versatility with efficiency.
  • Chat interfaces in constrained environments
  • Local daily-driver AI assistant
  • Image/document description and understanding
  • Translation and content generation
  • Specialized agentic use cases
  • Fine-tuning and specialization
  • And more...

Bringing advanced AI capabilities to resource-constrained environments.

Ministral 3 Family

Model NameTypePrecisionLink
Ministral 3 3B Base 2512Base pre-trainedBF16Hugging Face
Ministral 3 3B Instruct 2512Instruct post-trainedFP8Hugging Face
Ministral 3 3B Reasoning 2512Reasoning capableBF16Hugging Face
Ministral 3 8B Base 2512Base pre-trainedBF16Hugging Face
Ministral 3 8B Instruct 2512Instruct post-trainedFP8Hugging Face
Ministral 3 8B Reasoning 2512Reasoning capableBF16Hugging Face
Ministral 3 14B Base 2512Base pre-trainedBF16Hugging Face
Ministral 3 14B Instruct 2512Instruct post-trainedFP8Hugging Face
Ministral 3 14B Reasoning 2512Reasoning capableBF16Hugging Face
Other formats available here.

Benchmark Results

We compare Ministral 3 to similar sized models.

Reasoning

ModelAIME25AIME24GPQA DiamondLiveCodeBench
Ministral 3 14B0.8500.8980.7120.646
Qwen3-14B (Thinking)0.7370.8370.6630.593
Ministral 3 8B0.7870.8600.6680.616
Qwen3-VL-8B-Thinking0.7980.8600.6710.580
Ministral 3 3B0.7210.7750.5340.548
Qwen3-VL-4B-Thinking0.6970.7290.6010.513

Instruct

ModelArena HardWildBenchMATH Maj@1MM MTBench
Ministral 3 14B0.55168.50.9048.49
Qwen3 14B (Non-Thinking)0.42765.10.870NOT MULTIMODAL
Gemma3-12B-Instruct0.43663.20.8546.70
Ministral 3 8B0.50966.80.8768.08
Qwen3-VL-8B-Instruct0.52866.30.9468.00
Ministral 3 3B0.30556.80.8307.83
Qwen3-VL-4B-Instruct0.43856.80.9008.01
Qwen3-VL-2B-Instruct0.16342.20.7866.36
Gemma3-4B-Instruct0.31849.10.7595.23

Base

ModelMultilingual MMLUMATH CoT 2-ShotAGIEval 5-shotMMLU Redux 5-shotMMLU 5-shotTriviaQA 5-shot
Ministral 3 14B0.7420.6760.6480.8200.7940.749
Qwen3 14B Base0.7540.6200.6610.8370.8040.703
Gemma 3 12B Base0.6900.4870.5870.7660.7450.788
Ministral 3 8B0.7060.6260.5910.7930.7610.681
Qwen 3 8B Base0.7000.5760.5960.7940.7600.639
Ministral 3 3B0.6520.6010.5110.7350.7070.592
Qwen 3 4B Base0.6770.4050.5700.7590.7130.530
Gemma 3 4B Base0.5160.2940.4300.6260.5890.640

Usage

The model can be used with the following frameworks;

vLLM

We recommend using this model with vLLM.

#### Installation

Make sure to install vLLM >= 0.12.0:

pip install vllm --upgrade

Doing so should automatically install mistralcommon >= 1.8.6.

To check:

python -c "import mistralcommon; print(mistralcommon.version)"

You can also make use of a ready-to-go docker image or on the docker hub.

#### Serve

Due to their size and the FP8 format of their weights Ministral-3-3B-Instruct-2512, Ministral-3-8B-Instruct-2512 and Ministral-3-14B-Instruct-2512 can run on a single 1xH200 GPU.

A simple launch command is:

vllm serve mistralai/Ministral-3-8B-Instruct-2512 \
  --enable-auto-tool-choice --tool-call-parser mistral

Key parameter notes:

  • enable-auto-tool-choice: Required when enabling tool usage.
  • tool-call-parser mistral: Required when enabling tool usage.

Additional flags:

  • You can set --max-model-len to preserve memory. By default it is set to 262144 which is quite large but not necessary for most scenarios.
  • You can set --max-num-batched-tokens to balance throughput and latency, higher means higher throughput but higher latency.

#### Usage of the model

Here we asumme that the model mistralai/Ministral-3-8B-Instruct-2512 is served and you can ping it to the domain localhost with the port 8000 which is the default for vLLM.


Vision Reasoning

Let's see if the Ministral 3 knows when to pick a fight !

from datetime import datetime, timedelta

from openai import OpenAI
from huggingfacehub import hfhub_download

Modify OpenAI's API key and API base to use vLLM's API server.

openaiapikey = "EMPTY" openaiapibase = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 262144

client = OpenAI(
apikey=openaiapi_key,
baseurl=openaiapi_base,
)

models = client.models.list()
model = models.data[0].id

def loadsystemprompt(repo_id: str, filename: str) -> str:
filepath = hfhubdownload(repoid=repo_id, filename=filename)
with open(file_path, "r") as file:
system_prompt = file.read()
today = datetime.today().strftime("%Y-%m-%d")
yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
modelname = repoid.split("/")[-1]
return systemprompt.format(name=modelname, today=today, yesterday=yesterday)

SYSTEMPROMPT = loadsystemprompt(model, "SYSTEMPROMPT.txt")
image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"

messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
},
{"type": "imageurl", "imageurl": {"url": image_url}},
],
},
]

print(messages)

response = client.chat.completions.create(
model=model,
messages=messages,
temperature=TEMP,
maxtokens=MAXTOK,
)

print(response.choices[0].message.content)


Function Calling

Let's solve some equations thanks to our simple Python calculator tool.

import json
from openai import OpenAI
from huggingfacehub import hfhub_download

Modify OpenAI's API key and API base to use vLLM's API server.

openaiapikey = "EMPTY" openaiapibase = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 262144

client = OpenAI(
apikey=openaiapi_key,
baseurl=openaiapi_base,
)

models = client.models.list()
model = models.data[0].id

def loadsystemprompt(repo_id: str, filename: str) -> str:
filepath = hfhubdownload(repoid=repo_id, filename=filename)
with open(file_path, "r") as file:
system_prompt = file.read()
return system_prompt

SYSTEMPROMPT = loadsystemprompt(model, "SYSTEMPROMPT.txt")

image_url = "https://math-coaching.com/img/fiche/46/expressions-mathematiques.jpg"

def my_calculator(expression: str) -> str:
return str(eval(expression))

tools = [
{
"type": "function",
"function": {
"name": "my_calculator",
"description": "A calculator that can evaluate a mathematical expression.",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "The mathematical expression to evaluate.",
},
},
"required": ["expression"],
},
},
},
{
"type": "function",
"function": {
"name": "rewrite",
"description": "Rewrite a given text for improved clarity",
"parameters": {
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "The input text to rewrite",
}
},
},
},
},
]

messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Thanks to your calculator, compute the results for the equations that involve numbers displayed in the image.",
},
{
"type": "image_url",
"image_url": {
"url": image_url,
},
},
],
},
]

response = client.chat.completions.create(
model=model,
messages=messages,
temperature=TEMP,
maxtokens=MAXTOK,
tools=tools,
tool_choice="auto",
)

toolcalls = response.choices[0].message.toolcalls

results = []
for toolcall in toolcalls:
functionname = toolcall.function.name
functionargs = toolcall.function.arguments
if functionname == "mycalculator":
result = mycalculator(json.loads(functionargs))
results.append(result)

messages.append({"role": "assistant", "toolcalls": toolcalls})
for toolcall, result in zip(toolcalls, results):
messages.append(
{
"role": "tool",
"toolcallid": tool_call.id,
"name": tool_call.function.name,
"content": result,
}
)

response = client.chat.completions.create(
model=model,
messages=messages,
temperature=TEMP,
maxtokens=MAXTOK,
)

print(response.choices[0].message.content)


Text-Only Request

Ministral 3 can follow your instructions to the letter.

from openai import OpenAI
from huggingfacehub import hfhub_download

Modify OpenAI's API key and API base to use vLLM's API server.

openaiapikey = "EMPTY" openaiapibase = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 262144

client = OpenAI(
apikey=openaiapi_key,
baseurl=openaiapi_base,
)

models = client.models.list()
model = models.data[0].id

def loadsystemprompt(repo_id: str, filename: str) -> str:
filepath = hfhubdownload(repoid=repo_id, filename=filename)
with open(file_path, "r") as file:
system_prompt = file.read()
return system_prompt

SYSTEMPROMPT = loadsystemprompt(model, "SYSTEMPROMPT.txt")

messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{
"role": "user",
"content": "Write me a sentence where every word starts with the next letter in the alphabet - start with 'a' and end with 'z'.",
},
]

response = client.chat.completions.create(
model=model,
messages=messages,
temperature=TEMP,
maxtokens=MAXTOK,
)

assistant_message = response.choices[0].message.content
print(assistant_message)

Transformers

You can also use Ministral 3 8B Instruct 2512 with Transformers !

Transformers very recently added prelimenary support for FP8, so please make sure to install from main:

uv pip install git+https://github.com/huggingface/transformers

To make the best use of our model with Transformers make sure to have installed mistral-common >= 1.8.6 to use our tokenizer.

pip install mistral-common --upgrade

Try it out by running the following snippet.

[!Tip]

By default Transformers will load the checkpoint in FP8 and dequantize it to BF16 on the fly,

which means the model currently does not make use of accelerated FP8-kernels.

Compatibility with accelerated FP8-kernels is currently worked on and will be available in a couple of weeks.

Stay tuned!

Then load our tokenizer along with the model and generate:


Python snippet

import torch
from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend

model_id = "mistralai/Ministral-3-8B-Instruct-2512"

tokenizer = MistralCommonBackend.frompretrained(modelid)
model = Mistral3ForConditionalGeneration.frompretrained(modelid, device_map="auto")

image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"

messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
},
{"type": "imageurl", "imageurl": {"url": image_url}},
],
},
]

tokenized = tokenizer.applychattemplate(messages, returntensors="pt", returndict=True)

tokenized["inputids"] = tokenized["inputids"].to(device="cuda")
tokenized["pixelvalues"] = tokenized["pixelvalues"].to(dtype=torch.bfloat16, device="cuda")
imagesizes = [tokenized["pixelvalues"].shape[-2:]]

output = model.generate(
tokenized,
imagesizes=imagesizes,
maxnewtokens=512,
)[0]

decodedoutput = tokenizer.decode(output[len(tokenized["inputids"][0]):])
print(decoded_output)

Note:

Transformers allows you to automatically convert the checkpoint to Bfloat16. To so simple load the model as follows:

from transformers import Mistral3ForConditionalGeneration, FineGrainedFP8Config

model_id = "mistralai/Ministral-3-8B-Instruct-2512"
model = Mistral3ForConditionalGeneration.from_pretrained(
model_id,
device_map="auto",
quantization_config=FineGrainedFP8Config(dequantize=True)
)

License

This model is licensed under the Apache 2.0 License.

You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
Ministral-3-8B-Instruct-2512-BF16.gguf
LFS FP16
15.82 GB Download
Ministral-3-8B-Instruct-2512-IQ4_NL.gguf
LFS Q4
4.6 GB Download
Ministral-3-8B-Instruct-2512-IQ4_XS.gguf
LFS Q4
4.39 GB Download
Ministral-3-8B-Instruct-2512-Q2_K.gguf
LFS Q2
3.12 GB Download
Ministral-3-8B-Instruct-2512-Q2_K_L.gguf
LFS Q2
3.24 GB Download
Ministral-3-8B-Instruct-2512-Q3_K_M.gguf
LFS Q3
3.95 GB Download
Ministral-3-8B-Instruct-2512-Q3_K_S.gguf
LFS Q3
3.6 GB Download
Ministral-3-8B-Instruct-2512-Q4_0.gguf
Recommended LFS Q4
4.6 GB Download
Ministral-3-8B-Instruct-2512-Q4_1.gguf
LFS Q4
5.05 GB Download
Ministral-3-8B-Instruct-2512-Q4_K_M.gguf
LFS Q4
4.84 GB Download
Ministral-3-8B-Instruct-2512-Q4_K_S.gguf
LFS Q4
4.61 GB Download
Ministral-3-8B-Instruct-2512-Q5_K_M.gguf
LFS Q5
5.64 GB Download
Ministral-3-8B-Instruct-2512-Q5_K_S.gguf
LFS Q5
5.51 GB Download
Ministral-3-8B-Instruct-2512-Q6_K.gguf
LFS Q6
6.49 GB Download
Ministral-3-8B-Instruct-2512-Q8_0.gguf
LFS Q8
8.41 GB Download
Ministral-3-8B-Instruct-2512-UD-IQ1_M.gguf
LFS
2.25 GB Download
Ministral-3-8B-Instruct-2512-UD-IQ1_S.gguf
LFS
2.12 GB Download
Ministral-3-8B-Instruct-2512-UD-IQ2_M.gguf
LFS Q2
2.95 GB Download
Ministral-3-8B-Instruct-2512-UD-IQ2_XXS.gguf
LFS Q2
2.46 GB Download
Ministral-3-8B-Instruct-2512-UD-IQ3_XXS.gguf
LFS Q3
3.26 GB Download
Ministral-3-8B-Instruct-2512-UD-Q2_K_XL.gguf
LFS Q2
3.32 GB Download
Ministral-3-8B-Instruct-2512-UD-Q3_K_XL.gguf
LFS Q3
4.13 GB Download
Ministral-3-8B-Instruct-2512-UD-Q4_K_XL.gguf
LFS Q4
4.92 GB Download
Ministral-3-8B-Instruct-2512-UD-Q5_K_XL.gguf
LFS Q5
5.66 GB Download
Ministral-3-8B-Instruct-2512-UD-Q6_K_XL.gguf
LFS Q6
7.19 GB Download
Ministral-3-8B-Instruct-2512-UD-Q8_K_XL.gguf
LFS Q8
10.33 GB Download
mmproj-BF16.gguf
LFS FP16
818.52 MB Download
mmproj-F16.gguf
LFS FP16
817.37 MB Download
mmproj-F32.gguf
LFS
1.6 GB Download