πŸ“‹ Model Description


license: llama3 base_model:
  • Nanbeige/Nanbeige4.1-3B
language:
  • en
pipeline_tag: text-generation library_name: transformers tags:
  • text-generation-inference

Nanbeige4.1-3B-f32-GGUF

Nanbeige4.1-3B from Nanbeige is a compact 3B-parameter decoder-only Transformer language model (both Base and Thinking variants) pre-trained on 23T high-quality tokens using hybrid filtering and WSD strategies, followed by multi-stage post-trainingβ€”30M+ SFT samples, thoughtfulT refinement, dual-level distillation from larger Nanbeige models, and RLβ€”achieving state-of-the-art small-model reasoning that outperforms Qwen3-8B/30B-32B on AIME2024/2025 (SOTA averages), GPQA-Diamond, LiveCodeBench-Pro, IMO-Answer-Bench, BFCL-V4 tool-use (53.8, +5.2 over Qwen3-30B-A3B), and Arena-Hard-V2/Multi-Challenge alignment (60.0/41.8) with 64K RoPE-extended context via ABF. Designed for deep single-pass multi-step reasoning on math/science/coding/puzzles without agentic loops, it employs Fine-Grained Warmup-Stable-Decay scheduling (0.1T warmup + 18.9T stable phases shifting to top-quality data) for superior token/sequence-level performance, matching 10x-larger models on demanding tasks while enabling consumer-grade local deployment under open license.

Nanbeige4.1-3B [GGUF]

File NameQuant TypeFile SizeFile Link
Nanbeige4.1-3B.BF16.ggufBF167.87 GBDownload
Nanbeige4.1-3B.F16.ggufF167.87 GBDownload
Nanbeige4.1-3B.F32.ggufF3215.7 GBDownload
Nanbeige4.1-3B.Q80.ggufQ804.18 GBDownload

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant
types (lower is better):

!image.png

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
Nanbeige4.1-3B.BF16.gguf
Recommended LFS FP16
7.33 GB Download
Nanbeige4.1-3B.F16.gguf
LFS FP16
7.33 GB Download
Nanbeige4.1-3B.F32.gguf
LFS
14.66 GB Download
Nanbeige4.1-3B.Q8_0.gguf
LFS Q8
3.9 GB Download