πŸ“‹ Model Description


license: other license_name: tencent-hunyuan license_link: https://github.com/Tencent-Hunyuan/Hunyuan-7B/blob/main/LICENSE.txt base_model: tencent/Hunyuan-7B-Instruct basemodelrelation: quantized tags:
  • imatrix
  • hunyuanv1dense
language:
  • zh
  • en
  • ja
  • es
  • pt
  • fr
  • ko
pipeline_tag: text-generation
Output + Embedding
2-bit 3-bit 4-bit 5-bit 6-bit 8-bit 16-bit 32-bit
AXL BXL CXL DXL EXL FXL GXL HXL

Master table

BitsVariantSize (GB)BPWPPLPPL error
1IQ1\M\FXL2.192.321.89670.01174
1IQ1\M\GXL2.682.851.89690.01175
1IQ1\M\HXL3.733.971.89650.01174
2Q2\K\FXL3.133.331.62340.00922
2Q2\K\GXL3.633.861.62340.00922
2Q2\K\HXL4.684.981.62340.00922
3Q3\K\M\_FXL3.924.171.56740.00864
3Q3\K\M\_GXL4.414.701.56740.00864
3Q3\K\M\_HXL5.465.811.56720.00864
4Q4\K\M\_FXL4.755.061.55670.00852
4Q4\K\M\_GXL5.245.581.55700.00853
4Q4\K\M\_HXL6.296.701.55660.00852
5Q5\K\M\_FXL5.505.851.55720.00855
5Q5\K\M\_GXL5.996.381.55740.00856
5Q5\K\M\_HXL7.047.501.55700.00855
6Q6\K\FXL6.296.701.55250.00848
6Q6\K\GXL6.787.221.55240.00848
6Q6\K\HXL7.838.341.55230.00848
8Q8\0\GXL8.479.031.55150.00847
8Q8\0\HXL9.5210.141.55140.00847
16BF1615.0016.001.55230.00848

Variant chooser, prefer FXL first

Variant (preferred)Size (GB)Quality vs BF16Inference speedLong context headroom
IQ1\M\FXL2.19LowFastestExcellent
Q2\K\FXL3.13FairVery fastExcellent
Q3\K\M\_FXL3.92GoodFastVery good
Q4\K\M\_FXL4.75ExcellentFastGood
Q5\K\M\_FXL5.50ExcellentMediumGood
Q6\K\FXL6.29ExcellentMediumOK
Q8\0\GXL8.47ExcellentSlowerTight
BF1615.00ReferenceSlowestVery tight

Quick picks by GPU VRAM

GPU VRAMPickWhy
16 GBQ6\K\FXL or Q4\K\M\_FXLSame quality as BF16 in your PPL, plenty of room for long context or batching.
12 GBQ4\K\M\_FXLBest balance on 12 GB, strong quality with good headroom.
8 GBQ4\K\M\FXL (default) or Q3\K\M\FXL for longer ctxQ4 runs well; drop to Q3 if you need more KV cache.
6 GBQ3\K\M\FXLUsually fits with comfortable headroom. Use Q4\K\M\FXL only for short ctx.
4 GBQ2\K\FXL first, IQ1\M\FXL last resortFits strict limits. Accept the quality hit as needed.
Notes
  • Preference order for size at equal quality: FXL first, then GXL, then HXL.
  • If you need more context headroom, drop one quant level rather than pushing to heavier weights.

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
Hunyuan-7B-Instruct-IQ1_M_FXL.gguf
LFS
2.04 GB Download
Hunyuan-7B-Instruct-IQ1_M_GXL.gguf
LFS
2.49 GB Download
Hunyuan-7B-Instruct-IQ1_M_HXL.gguf
LFS
3.47 GB Download
Hunyuan-7B-Instruct-Q2_K_FXL.gguf
LFS Q2
2.92 GB Download
Hunyuan-7B-Instruct-Q2_K_GXL.gguf
LFS Q2
3.38 GB Download
Hunyuan-7B-Instruct-Q2_K_HXL.gguf
LFS Q2
4.35 GB Download
Hunyuan-7B-Instruct-Q3_K_M_FXL.gguf
LFS Q3
3.65 GB Download
Hunyuan-7B-Instruct-Q3_K_M_GXL.gguf
LFS Q3
4.11 GB Download
Hunyuan-7B-Instruct-Q3_K_M_HXL.gguf
LFS Q3
5.09 GB Download
Hunyuan-7B-Instruct-Q4_K_M_FXL.gguf
Recommended LFS Q4
4.43 GB Download
Hunyuan-7B-Instruct-Q4_K_M_GXL.gguf
LFS Q4
4.88 GB Download
Hunyuan-7B-Instruct-Q4_K_M_HXL.gguf
LFS Q4
5.86 GB Download
Hunyuan-7B-Instruct-Q5_K_M_FXL.gguf
LFS Q5
5.12 GB Download
Hunyuan-7B-Instruct-Q5_K_M_GXL.gguf
LFS Q5
5.58 GB Download
Hunyuan-7B-Instruct-Q5_K_M_HXL.gguf
LFS Q5
6.56 GB Download
Hunyuan-7B-Instruct-Q6_K_FXL.gguf
LFS Q6
5.86 GB Download
Hunyuan-7B-Instruct-Q6_K_GXL.gguf
LFS Q6
6.32 GB Download
Hunyuan-7B-Instruct-Q6_K_HXL.gguf
LFS Q6
7.3 GB Download
Hunyuan-7B-Instruct-Q8_0_GXL.gguf
LFS Q8
7.89 GB Download
Hunyuan-7B-Instruct-Q8_0_HXL.gguf
LFS Q8
8.87 GB Download
Hunyuan-7B-Instruct-bf16.gguf
LFS FP16
13.99 GB Download
imatrix.gguf
LFS
4.78 MB Download