tantk/Nanbeige4.1-3B-GGUF

Name: tantk/Nanbeige4.1-3B-GGUF
Author: tantk

High-quality GGUF model

11.2K 📥 Downloads

5 ❤️ Likes

7 📁 GGUF Files

22.53 GB 💾 Total Size

2 weeks ago 🔄 Last Updated

📋 Model Description

base_model: Nanbeige/Nanbeige4.1-3B tags: - gguf - llama - nanbeige - quantized license: apache-2.0 language: - en - zh

Nanbeige4.1-3B-GGUF

GGUF quantizations of Nanbeige/Nanbeige4.1-3B for use with llama.cpp, Ollama, and other GGUF-compatible tools.

Available Quantizations

File	Quant	Size	Description
`nanbeige4.1-3b-f16.gguf`	F16	7.4 GB	Full precision (no quantization)
`nanbeige4.1-3b-Q80.gguf`	Q80	3.9 GB	Best quality, largest quantized size
`nanbeige4.1-3b-Q6K.gguf`	Q6K	3.1 GB	Very high quality
`nanbeige4.1-3b-Q5KM.gguf`	Q5KM	2.7 GB	High quality
`nanbeige4.1-3b-Q4KM.gguf`	Q4KM	2.3 GB	Good quality, recommended for most users
`nanbeige4.1-3b-Q3KM.gguf`	Q3KM	1.9 GB	Medium quality
`nanbeige4.1-3b-Q2K.gguf`	Q2K	1.6 GB	Smallest size, lower quality (received report of constantly stuck in a loop)

Usage

Ollama

# Download a specific quantization (e.g. Q4KM)
ollama run hf.co/tantk/Nanbeige4.1-3B-GGUF:Q4KM

Or create from a downloaded file
ollama create nanbeige4.1-3b -f Modelfile

llama.cpp

llama-cli -m nanbeige4.1-3b-Q4KM.gguf -p "Your prompt here" --temp 0.6 --top-p 0.95

Model Details

Base Model: Nanbeige/Nanbeige4.1-3B
Architecture: LlamaForCausalLM
Parameters: 3B (4B total)
Context Length: 131,072 tokens
Chat Template: ChatML (<|imstart|> / <|im_end|>)
License: Apache 2.0

Recommended Settings

Temperature: 0.6
Top-p: 0.95
Repeat penalty: 1.0

Benchmark Results

Test Hardware

Component	Spec
CPU	AMD Ryzen 5 5600G (6 cores / 12 threads, 3.9 GHz)
RAM	32 GB DDR4-3200 (4x 8 GB Kingston)
GPU	NVIDIA GeForce RTX 4070 Ti (12 GB VRAM)
OS	Windows 11 Pro

CPU Benchmark (llama-bench)

Backend: CPU
Threads: 6
Prompt tokens: 512 (pp512)
Generation tokens: 128 (tg128)
Repetitions: 3
Tool: llama-bench (llama.cpp build 0c1f39a)

Quant	Size	Params	Prompt (t/s)	Generation (t/s)
Q2_K	1.51 GiB	3.93 B	47.14 ± 0.71	20.99 ± 1.04
Q3KM	1.87 GiB	3.93 B	40.23 ± 1.01	17.65 ± 0.25
Q4KM	2.27 GiB	3.93 B	67.80 ± 1.14	14.35 ± 0.52
Q5KM	2.63 GiB	3.93 B	29.68 ± 0.24	13.75 ± 0.17
Q6_K	3.01 GiB	3.93 B	33.76 ± 2.41	12.28 ± 0.06
Q8_0	3.89 GiB	3.93 B	45.07 ± 0.41	9.07 ± 0.47
F16	7.33 GiB	3.93 B	31.08 ± 0.75	5.22 ± 0.05

GPU Benchmark (llama-bench)

Backend: CUDA (RTX 4070 Ti, 100% GPU offload, ngl=99)
Prompt tokens: 512 (pp512)
Generation tokens: 128 (tg128)
Repetitions: 3
Tool: llama-bench (llama.cpp build 0c1f39a)

Quant	Size	Params	Prompt (t/s)	Generation (t/s)
Q2_K	1.51 GiB	3.93 B	7,904.89 ± 44.44	194.47 ± 1.68
Q3KM	1.87 GiB	3.93 B	9,233.97 ± 132.75	162.72 ± 1.04
Q4KM	2.27 GiB	3.93 B	9,977.17 ± 123.83	155.27 ± 0.21
Q5KM	2.63 GiB	3.93 B	8,060.71 ± 1484.42	139.18 ± 0.44
Q6_K	3.01 GiB	3.93 B	7,794.85 ± 1023.17	126.49 ± 0.83
Q8_0	3.89 GiB	3.93 B	6,349.76 ± 698.63	102.88 ± 0.32
F16	7.33 GiB	3.93 B	8,946.09 ± 230.61	60.75 ± 0.20

Credits

Original model by Nanbeige. Quantized with llama.cpp.

📂 GGUF File List

📁 Filename	📦 Size	⚡ Download
nanbeige4.1-3b-Q2_K.gguf LFS Q2	1.51 GB	Download
nanbeige4.1-3b-Q3_K_M.gguf LFS Q3	1.88 GB	Download
nanbeige4.1-3b-Q4_K_M.gguf Recommended LFS Q4	2.28 GB	Download
nanbeige4.1-3b-Q5_K_M.gguf LFS Q5	2.63 GB	Download
nanbeige4.1-3b-Q6_K.gguf LFS Q6	3.01 GB	Download
nanbeige4.1-3b-Q8_0.gguf LFS Q8	3.9 GB	Download
nanbeige4.1-3b-f16.gguf LFS FP16	7.33 GB	Download

📊 Model Information

🆔 Model ID: tantk/Nanbeige4.1-3B-GGUF

📅 Created: 2 weeks ago

🔄 Last Updated: 2 weeks ago

📥 Downloads: 11.2K

❤️ Likes: 5

🎯 Difficulty: Intermediate

⚙️ Quantization: Q2, Q3, Q4, Q5, Q6, Q8, FP16

🏷️ Tags

ggufllamananbeigequantizedenzhbase_model:Nanbeige/Nanbeige4.1-3Bbase_model:quantized:Nanbeige/Nanbeige4.1-3Blicense:apache-2.0endpoints_compatibleregion:usconversational

🔗 Related Links

🤗 Visit HuggingFace ⚡ Quick Download