lefromage/Qwen3-Next-80B-A3B-Instruct-GGUF

Name: lefromage/Qwen3-Next-80B-A3B-Instruct-GGUF
Author: lefromage

High-quality GGUF model

60.5K 📥 Downloads

44 ❤️ Likes

18 📁 GGUF Files

916.22 GB 💾 Total Size

3 weeks ago 🔄 Last Updated

📋 Model Description

base_model:

Qwen/Qwen3-Next-80B-A3B-Instruct

license: apache-2.0 pipeline_tag: text-generation

Recent update:

added IQ4_XS

Qwen3-Next-80B-A3B-Instruct ❤️ llama.cpp

The qwen_next PR (Pull Request #16095) was merged into the main branch and is in llama.cpp release b7186

Homebrew is updated and you can just do:

brew upgrade llama.cpp

you may also just build from source:

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
time cmake -B build
time cmake --build build --config Release --parallel $(nproc --all)

The speed in tokens/second is decent and will be improved over time:

for Q4_0 quant:

on Macbook M4 Max:

prompt: 54 t/s gen: 11 t/s (CPU only ie -ngl 0)
prompt: 41 t/s gen: 7 t/s (GPU only ie -ngl 99)

on NVIDIA CUDA L40S:

prompt: 127 t/s gen: 42 t/s GPU

Recent update:

added IQ4NL, Q41, Q5_0

added Q3KS, Q3KL, Q5KS

Update:

I have tested some of these smaller models on NVIDIA with default CUDA compile with the excellent release from @cturan on NVIDIA L40S GPU.

Since L40S GPU is 48GB VRAM, I was able to run Q2K, Q3KM, Q4KS, Q40 and Q4MXFP4MOE:

but Q4KM was too big.
Although it works if using -ngl 45
but it slowed down quite a bit.

There may be a better way but did not have time to test.

Was able to get a good speed of 53 tokens per second in the generation
and 800 tokens per second in the prompt reading.

wget https://github.com/cturan/llama.cpp/archive/refs/tags/test.tar.gz tar xf test.tar.gz cd llama.cpp-test export PATH=/usr/local/cuda/bin:$PATH

time cmake -B build -DGGML_CUDA=ON time cmake --build build --config Release --parallel $(nproc --all)

You may need to add /usr/local/cuda/bin to your PATH
to find nvcc (Nvidia CUDA compiler)

Building from source took about 7 minutes.

For more detail on CUDA build see:
https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#cuda

Quantized Models:

These quantized models were generated using the excellent pull request from @pwilkin
#16095
on 2025-10-19 with commit 2fdbf16eb.

NOTE: currently they only work with the llama.cpp 16095 pull request which is still in development.
Speed and quality should improve over time.

How to build and run for MacOS

PR=16095 git clone https://github.com/ggml-org/llama.cpp llama.cpp-PR-$PR cd llama.cpp-PR-$PR git fetch origin pull/$PR/head:pr-$PR git checkout pr-$PR

time cmake -B build time cmake --build build --config Release --parallel $(nproc --all)

Run examples

Run with Hugging Face model:

build/bin/llama-cli -hf lefromage/Qwen3-Next-80B-A3B-Instruct-GGUF --prompt 'What is the capital of France?' --no-mmap -st

by default will download lefromage/Qwen3-Next-80B-A3B-Instruct-GGUF:Q4KM

To download:

wget https://huggingface.co/lefromage/Qwen3-Next-80B-A3B-Instruct-GGUF/resolve/main/QwenQwen3-Next-80B-A3B-Instruct-Q4_0.gguf

pip install hftransfer 'huggingfacehub[cli]'
hf download lefromage/Qwen3-Next-80B-A3B-Instruct-GGUF QwenQwen3-Next-80B-A3B-Instruct-Q4_0.gguf

Run with local model file:

build/bin/llama-cli -m QwenQwen3-Next-80B-A3B-Instruct-Q4_0.gguf --prompt 'Write a paragraph about quantum computing' --no-mmap -st

Example prompt and output

User prompt:

Write a paragraph about quantum computing

Assistant output:

Quantum computing represents a revolutionary leap in computational power by harnessing the principles of quantum mechanics, such as superposition and entanglement, to process information in fundamentally new ways. Unlike classical computers, which use bits that are either 0 or 1, quantum computers use quantum bits, or qubits, which can exist in a combination of both states simultaneously. This allows quantum computers to explore vast solution spaces in parallel, making them potentially exponentially faster for certain problems—like factoring large numbers, optimizing complex systems, or simulating molecular structures for drug discovery. While still in its early stages, with challenges including qubit stability, error correction, and scalability, quantum computing holds transformative promise for fields ranging from cryptography to artificial intelligence. As researchers and tech companies invest heavily in hardware and algorithmic development, the race to achieve practical, fault-tolerant quantum machines is accelerating, heralding a new era in computing technology.

[end of text]

📂 GGUF File List

📁 Filename	📦 Size	⚡ Download
Qwen__Qwen3-Next-80B-A3B-Instruct-IQ4_NL.gguf LFS Q4	41.99 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-IQ4_XS.gguf LFS Q4	39.68 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-MXFP4.gguf LFS	40.74 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-MXFP4_MOE.gguf LFS	40.74 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-Q2_K.gguf LFS Q2	27.13 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-Q3_K_L.gguf LFS Q3	38.4 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-Q3_K_M.gguf LFS Q3	35.57 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-Q3_K_S.gguf LFS Q3	32.17 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-Q4_0.gguf Recommended LFS Q4	41.98 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-Q4_1.gguf LFS Q4	46.6 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-Q4_K_M.gguf LFS Q4	45.09 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-Q4_K_S.gguf LFS Q4	42.36 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-Q5_0.gguf LFS Q5	51.22 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-Q5_K_M.gguf LFS Q5	52.82 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-Q5_K_S.gguf LFS Q5	51.22 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-Q6_K.gguf LFS Q6	61.03 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-Q8_0.gguf LFS Q8	78.99 GB	Download
Qwen__Qwen3-Next-80B-A3B-Instruct-f16.gguf LFS FP16	148.51 GB	Download

📊 Model Information

🆔 Model ID: lefromage/Qwen3-Next-80B-A3B-Instruct-GGUF

📅 Created: 2 months ago

🔄 Last Updated: 3 weeks ago

📥 Downloads: 60.5K

❤️ Likes: 44

🎯 Difficulty: Advanced

⚙️ Quantization: Q4, Q2, Q3, Q5, Q6, Q8, FP16

🏷️ Tags

gguftext-generationbase_model:Qwen/Qwen3-Next-80B-A3B-Instructbase_model:quantized:Qwen/Qwen3-Next-80B-A3B-Instructlicense:apache-2.0endpoints_compatibleregion:usconversational

🔗 Related Links

🤗 Visit HuggingFace ⚡ Quick Download