unsloth/Mistral-Large-3-675B-Instruct-2512-GGUF

Name: unsloth/Mistral-Large-3-675B-Instruct-2512-GGUF
Author: unsloth

High-quality GGUF model

15.5K 📥 Downloads

5 ❤️ Likes

4 📁 GGUF Files

170.2 GB 💾 Total Size

1 weeks ago 🔄 Last Updated

📋 Model Description

language:

license: apache-2.0 inference: false base_model:

mistralai/Mistral-Large-3-675B-Instruct-2512

tags:

mistral-common
mistral
unsloth

See our Ministral 3 collection for all versions including GGUF, 4-bit & FP8 formats.

Learn to run Ministral correctly - Read our Guide.

See Unsloth Dynamic 2.0 GGUFs for our quantization benchmarks.

✨ Read our Ministral 3 Guide here!

Fine-tune Ministral 3 for free using our Google Colab notebook
Or train Ministral 3 with reinforcement learning (GSPO) with our free notebook.
View the rest of our notebooks in our docs here.

Mistral Large 3 675B Instruct 2512 BF16

From our family of large models, Mistral Large 3 is a state-of-the-art general-purpose Multimodal granular Mixture-of-Experts model with 41B active parameters and 675B total parameters trained from the ground up.

This model is the instruct post-trained version, fine-tuned for instruction tasks, making it ideal for chat, agentic and instruction based use cases.
Designed for reliability and long-context comprehension - It is engineered for production-grade assistants, retrieval-augmented systems, scientific workloads, and complex enterprise workflows.

This version corresponds to the BF16 weights, Mistral Large 3 is deployable on-premises in:

FP8 on a single node of B200s or H200s.
NVFP4 on a single node of H100s or A100s.

Key Features

Mistral Large 3 consists of two main architectural components:

A Granular MoE Language Model with 673B params and 39B active
A 2.5B Vision Encoder

The Mistral Large 3 Instruct model offers the following capabilities:

Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
System Prompt: Maintains strong adherence and support for system prompts.
Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
Frontier: Delivers best-in-class performance.
Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
Large Context Window: Supports a 256k context window.

Use Cases

With powerful long-context performance, stable and consistent cross-domain behavior, Mistral Large 3 is perfect for:

Long Document Understanding
Powerful Daily-Driver AI Assistants
State-of-the-Art Agentic and Tool-Use Capabilities
Enterprise Knowledge Work
General Coding Assistant

And enterprise-grade use cases requiring frontier capabilities.

Recommended Settings

We recommend deploying Large 3 in a client-server configuration with the following best practices:

System Prompt: Define a clear environment and use case, including guidance on how to effectively leverage tools in agentic systems.
Sampling Parameters: Use a temperature below 0.1 for daily-driver and production environments ; Higher temperatures may be explored for creative use cases - developers are encouraged to experiment with alternative settings.
Tools: Keep the set of tools well-defined and limit their number to the minimum required for the use case - Avoiding overloading the model with an excessive number of tools.
Vision: When deploying with vision capabilities, we recommend maintaining an aspect ratio close to 1:1 (width-to-height) for images. Avoiding the use of overly thin or wide images - crop them as needed to ensure optimal performance.

Known Issues / Limitations

Not a dedicated reasoning model: Dedicated reasoning models can outperform Mistral Large 3 in strict reasoning use cases.
Behind vision-first models in multimodal tasks: Mistral Large 3 can lag behind models optimized for vision tasks and use cases.
Complex deployment: Due to its large size and architecture, the model can be challenging to deploy efficiently with constrained resources or at scale.

Benchmark Results

We compare Mistral Large 3 to similar sized models.

Text

Vision

Usage

The model can be used with the following frameworks;

vllm: See here

vLLM

We recommend using this model with vLLM in FP8 or NVFP4.

#### Installation

Make sure to install vLLM >= 0.12.0:

pip install vllm --upgrade

Doing so should automatically install mistralcommon >= 1.8.6.

To check:

python -c "import mistralcommon; print(mistralcommon.version)"

You can also make use of a ready-to-go docker image or on the docker hub.

#### Serve

The Mistral Large 3 Instruct FP8 format can be used on one 8xH200 node. We recommend to use this format if you plan to fine-tuning as it can be more precise than NVFP4 in some situations.

A simple launch command is:

vllm serve mistralai/Mistral-Large-3-675B-Instruct-2512 \
  --tensor-parallel-size 8 \
  --enable-auto-tool-choice --tool-call-parser mistral

Key parameter notes:

enable-auto-tool-choice: Required when enabling tool usage.
tool-call-parser mistral: Required when enabling tool usage.

Additional flags:

You can set --max-model-len to preserve memory. By default it is set to 262144 which is quite large but not necessary for most scenarios.
You can set --max-num-batched-tokens to balance throughput and latency, higher means higher throughput but higher latency.

#### Usage of the model

Here we asumme that the model mistralai/Mistral-Large-3-675B-Instruct-2512 is served and you can ping it to the domain localhost with the port 8000 which is the default for vLLM.

Vision Reasoning

Let's see if Mistral Large 3 knows when to pick a fight !

from datetime import datetime, timedelta

from openai import OpenAI
from huggingfacehub import hfhub_download

Modify OpenAI's API key and API base to use vLLM's API server.
openaiapikey = "EMPTY"
openaiapibase = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 262144

client = OpenAI(
    apikey=openaiapi_key,
    baseurl=openaiapi_base,
)

models = client.models.list()
model = models.data[0].id

def loadsystemprompt(repo_id: str, filename: str) -> str:
    filepath = hfhubdownload(repoid=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    today = datetime.today().strftime("%Y-%m-%d")
    yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
    modelname = repoid.split("/")[-1]
    return systemprompt.format(name=modelname, today=today, yesterday=yesterday)

SYSTEMPROMPT = loadsystemprompt(model, "SYSTEMPROMPT.txt")
image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.",
            },
            {"type": "imageurl", "imageurl": {"url": image_url}},
        ],
    },
]

response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=TEMP,
    maxtokens=MAXTOK,
)

print(response.choices[0].message.content)

Function Calling

Let's solve some equations thanks to our simple Python calculator tool.

import json
from openai import OpenAI
from huggingfacehub import hfhub_download

Modify OpenAI's API key and API base to use vLLM's API server.
openaiapikey = "EMPTY"
openaiapibase = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 262144

client = OpenAI(
    apikey=openaiapi_key,
    baseurl=openaiapi_base,
)

models = client.models.list()
model = models.data[0].id

def loadsystemprompt(repo_id: str, filename: str) -> str:
    filepath = hfhubdownload(repoid=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    return system_prompt

SYSTEMPROMPT = loadsystemprompt(model, "SYSTEMPROMPT.txt")

image_url = "https://math-coaching.com/img/fiche/46/expressions-mathematiques.jpg"

def my_calculator(expression: str) -> str:
    return str(eval(expression))

tools = [
    {
        "type": "function",
        "function": {
            "name": "my_calculator",
            "description": "A calculator that can evaluate a mathematical equation and compute its results.",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "The mathematical expression to evaluate.",
                    },
                },
                "required": ["expression"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "rewrite",
            "description": "Rewrite a given text for improved clarity",
            "parameters": {
                "type": "object",
                "properties": {
                    "text": {
                        "type": "string",
                        "description": "The input text to rewrite",
                    }
                },
            },
        },
    },
]

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Thanks to your calculator, compute the results for the equations that involve numbers displayed in the image.",
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": image_url,
                },
            },
        ],
    },
]

response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=TEMP,
    maxtokens=MAXTOK,
    tools=tools,
    tool_choice="auto",
)

toolcalls = response.choices[0].message.toolcalls

results = []
for toolcall in toolcalls:
    functionname = toolcall.function.name
    functionargs = toolcall.function.arguments
    if functionname == "mycalculator":
        result = mycalculator(json.loads(functionargs))
        results.append(result)

messages.append({"role": "assistant", "toolcalls": toolcalls})
for toolcall, result in zip(toolcalls, results):
    messages.append(
        {
            "role": "tool",
            "toolcallid": tool_call.id,
            "name": tool_call.function.name,
            "content": result,
        }
    )

response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=TEMP,
    maxtokens=MAXTOK,
)

print(response.choices[0].message.content)

Text-Only Request

Mistral Large 3 can follow your instructions down to the letter.

from openai import OpenAI
from huggingfacehub import hfhub_download

Modify OpenAI's API key and API base to use vLLM's API server.
openaiapikey = "EMPTY"
openaiapibase = "http://localhost:8000/v1"

TEMP = 0.15
MAX_TOK = 262144

client = OpenAI(
    apikey=openaiapi_key,
    baseurl=openaiapi_base,
)

models = client.models.list()
model = models.data[0].id

def loadsystemprompt(repo_id: str, filename: str) -> str:
    filepath = hfhubdownload(repoid=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()
    return system_prompt

SYSTEMPROMPT = loadsystemprompt(model, "SYSTEMPROMPT.txt")

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {
        "role": "user",
        "content": "Write me a sentence where every word starts with the next letter in the alphabet - start with 'a' and end with 'z'.",
    },
]

response = client.chat.completions.create(
    model=model,
    messages=messages,
    temperature=TEMP,
    maxtokens=MAXTOK,
)

assistant_message = response.choices[0].message.content
print(assistant_message)

License

This model is licensed under the Apache 2.0 License.

You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.

📂 GGUF File List

📁 Filename	📦 Size	⚡ Download
Mistral-Large-3-675B-Instruct-2512-UD-TQ1_0.gguf Recommended LFS	151.05 GB	Download
mmproj-BF16.gguf LFS FP16	4.79 GB	Download
mmproj-F16.gguf LFS FP16	4.79 GB	Download
mmproj-F32.gguf LFS	9.57 GB	Download

📊 Model Information

🆔 Model ID: unsloth/Mistral-Large-3-675B-Instruct-2512-GGUF

📅 Created: 2 weeks ago

🔄 Last Updated: 1 weeks ago

📥 Downloads: 15.5K

❤️ Likes: 5

🎯 Difficulty: Advanced

⚙️ Quantization: FP16

🏷️ Tags

ggufmistral-commonmistralunslothenfresdeitptnlzhjakoarbase_model:mistralai/Mistral-Large-3-675B-Instruct-2512base_model:quantized:mistralai/Mistral-Large-3-675B-Instruct-2512license:apache-2.0region:usimatrixconversational

🔗 Related Links

🤗 Visit HuggingFace ⚡ Quick Download