Model Description
base_model:
- Qwen/Qwen3.5-122B-A10B
This repo contains specialized MoE-quants for Qwen3.5-122B-A10B. The idea being that given the huge size of the FFN tensors compared to the rest of the tensors in the model, it should be possible to achieve a better quality while keeping the overall size of the entire model smaller compared to a similar naive quantization. To that end, the quantization type default is kept in high quality and the FFN UP + FFN GATE tensors are quanted down along with the FFN DOWN tensors.
| Quant | Size | Mixture | PPL | 1-(Mean PPL(Q)/PPL(base)) | KLD |
|---|---|---|---|---|---|
| Q80 | 120.94 GiB (8.51 BPW) | Q80 | 5.733978 ± 0.075548 | -0.0146% | 0.002545 ± 0.000078 |
| Q5KM | 85.22 GiB (6.00 BPW) | Q80 / Q5K / Q5K / Q6K | 5.740017 ± 0.075671 | +0.0907% | 0.003674 ± 0.000078 |
| Q4KM | 71.44 GiB (5.03 BPW) | Q80 / Q4K / Q4K / Q5K | 5.742536 ± 0.075656 | +0.1347% | 0.006429 ± 0.000197 |
| IQ4XS | 56.25 GiB (3.96 BPW) | Q80 / IQ3S / IQ3S / IQ4_XS | 5.799691 ± 0.076499 | +1.1313% | 0.016301 ± 0.000344 |
| IQ3S | 43.35 GiB (3.05 BPW) | Q6K / IQ2S / IQ2S / IQ3_S | 5.928605 ± 0.078470 | +3.3792% | 0.040833 ± 0.000741 |