π Model Description
license: other license_name: tencent-hunyuan license_link: https://github.com/Tencent-Hunyuan/Hunyuan-7B/blob/main/LICENSE.txt base_model: tencent/Hunyuan-7B-Instruct basemodelrelation: quantized tags:
- imatrix
- hunyuanv1dense
- zh
- en
- ja
- es
- pt
- fr
- ko
Output + Embedding | ||||||||
---|---|---|---|---|---|---|---|---|
2-bit | 3-bit | 4-bit | 5-bit | 6-bit | 8-bit | 16-bit | 32-bit | |
AXL | BXL | CXL | DXL | EXL | FXL | GXL | HXL |
Master table
Bits | Variant | Size (GB) | BPW | PPL | PPL error |
---|---|---|---|---|---|
1 | IQ1\M\FXL | 2.19 | 2.32 | 1.8967 | 0.01174 |
1 | IQ1\M\GXL | 2.68 | 2.85 | 1.8969 | 0.01175 |
1 | IQ1\M\HXL | 3.73 | 3.97 | 1.8965 | 0.01174 |
2 | Q2\K\FXL | 3.13 | 3.33 | 1.6234 | 0.00922 |
2 | Q2\K\GXL | 3.63 | 3.86 | 1.6234 | 0.00922 |
2 | Q2\K\HXL | 4.68 | 4.98 | 1.6234 | 0.00922 |
3 | Q3\K\M\_FXL | 3.92 | 4.17 | 1.5674 | 0.00864 |
3 | Q3\K\M\_GXL | 4.41 | 4.70 | 1.5674 | 0.00864 |
3 | Q3\K\M\_HXL | 5.46 | 5.81 | 1.5672 | 0.00864 |
4 | Q4\K\M\_FXL | 4.75 | 5.06 | 1.5567 | 0.00852 |
4 | Q4\K\M\_GXL | 5.24 | 5.58 | 1.5570 | 0.00853 |
4 | Q4\K\M\_HXL | 6.29 | 6.70 | 1.5566 | 0.00852 |
5 | Q5\K\M\_FXL | 5.50 | 5.85 | 1.5572 | 0.00855 |
5 | Q5\K\M\_GXL | 5.99 | 6.38 | 1.5574 | 0.00856 |
5 | Q5\K\M\_HXL | 7.04 | 7.50 | 1.5570 | 0.00855 |
6 | Q6\K\FXL | 6.29 | 6.70 | 1.5525 | 0.00848 |
6 | Q6\K\GXL | 6.78 | 7.22 | 1.5524 | 0.00848 |
6 | Q6\K\HXL | 7.83 | 8.34 | 1.5523 | 0.00848 |
8 | Q8\0\GXL | 8.47 | 9.03 | 1.5515 | 0.00847 |
8 | Q8\0\HXL | 9.52 | 10.14 | 1.5514 | 0.00847 |
16 | BF16 | 15.00 | 16.00 | 1.5523 | 0.00848 |
Variant chooser, prefer FXL first
Variant (preferred) | Size (GB) | Quality vs BF16 | Inference speed | Long context headroom |
---|---|---|---|---|
IQ1\M\FXL | 2.19 | Low | Fastest | Excellent |
Q2\K\FXL | 3.13 | Fair | Very fast | Excellent |
Q3\K\M\_FXL | 3.92 | Good | Fast | Very good |
Q4\K\M\_FXL | 4.75 | Excellent | Fast | Good |
Q5\K\M\_FXL | 5.50 | Excellent | Medium | Good |
Q6\K\FXL | 6.29 | Excellent | Medium | OK |
Q8\0\GXL | 8.47 | Excellent | Slower | Tight |
BF16 | 15.00 | Reference | Slowest | Very tight |
Quick picks by GPU VRAM
GPU VRAM | Pick | Why |
---|---|---|
16 GB | Q6\K\FXL or Q4\K\M\_FXL | Same quality as BF16 in your PPL, plenty of room for long context or batching. |
12 GB | Q4\K\M\_FXL | Best balance on 12 GB, strong quality with good headroom. |
8 GB | Q4\K\M\FXL (default) or Q3\K\M\FXL for longer ctx | Q4 runs well; drop to Q3 if you need more KV cache. |
6 GB | Q3\K\M\FXL | Usually fits with comfortable headroom. Use Q4\K\M\FXL only for short ctx. |
4 GB | Q2\K\FXL first, IQ1\M\FXL last resort | Fits strict limits. Accept the quality hit as needed. |
- Preference order for size at equal quality: FXL first, then GXL, then HXL.
- If you need more context headroom, drop one quant level rather than pushing to heavier weights.
π GGUF File List
π Filename | π¦ Size | β‘ Download |
---|---|---|
Hunyuan-7B-Instruct-IQ1_M_FXL.gguf
LFS
|
2.04 GB | Download |
Hunyuan-7B-Instruct-IQ1_M_GXL.gguf
LFS
|
2.49 GB | Download |
Hunyuan-7B-Instruct-IQ1_M_HXL.gguf
LFS
|
3.47 GB | Download |
Hunyuan-7B-Instruct-Q2_K_FXL.gguf
LFS
Q2
|
2.92 GB | Download |
Hunyuan-7B-Instruct-Q2_K_GXL.gguf
LFS
Q2
|
3.38 GB | Download |
Hunyuan-7B-Instruct-Q2_K_HXL.gguf
LFS
Q2
|
4.35 GB | Download |
Hunyuan-7B-Instruct-Q3_K_M_FXL.gguf
LFS
Q3
|
3.65 GB | Download |
Hunyuan-7B-Instruct-Q3_K_M_GXL.gguf
LFS
Q3
|
4.11 GB | Download |
Hunyuan-7B-Instruct-Q3_K_M_HXL.gguf
LFS
Q3
|
5.09 GB | Download |
Hunyuan-7B-Instruct-Q4_K_M_FXL.gguf
Recommended
LFS
Q4
|
4.43 GB | Download |
Hunyuan-7B-Instruct-Q4_K_M_GXL.gguf
LFS
Q4
|
4.88 GB | Download |
Hunyuan-7B-Instruct-Q4_K_M_HXL.gguf
LFS
Q4
|
5.86 GB | Download |
Hunyuan-7B-Instruct-Q5_K_M_FXL.gguf
LFS
Q5
|
5.12 GB | Download |
Hunyuan-7B-Instruct-Q5_K_M_GXL.gguf
LFS
Q5
|
5.58 GB | Download |
Hunyuan-7B-Instruct-Q5_K_M_HXL.gguf
LFS
Q5
|
6.56 GB | Download |
Hunyuan-7B-Instruct-Q6_K_FXL.gguf
LFS
Q6
|
5.86 GB | Download |
Hunyuan-7B-Instruct-Q6_K_GXL.gguf
LFS
Q6
|
6.32 GB | Download |
Hunyuan-7B-Instruct-Q6_K_HXL.gguf
LFS
Q6
|
7.3 GB | Download |
Hunyuan-7B-Instruct-Q8_0_GXL.gguf
LFS
Q8
|
7.89 GB | Download |
Hunyuan-7B-Instruct-Q8_0_HXL.gguf
LFS
Q8
|
8.87 GB | Download |
Hunyuan-7B-Instruct-bf16.gguf
LFS
FP16
|
13.99 GB | Download |
imatrix.gguf
LFS
|
4.78 MB | Download |