π Model Description
base_model: intervitens/mini-magnum-12b-v1.1 library_name: transformers quantized_by: InferenceIllusionist tags:
- iMat
- gguf
- Mistral
mini-magnum-12b-v1.1-iMat-GGUF
>Important Note: Inferencing in llama.cpp has now been merged in PR #8604. Please ensure you are on release b3438 or newer. Text-generation-web-ui (Ooba) is also working as of 7/23. Kobold.cpp working as of v1.71.[!WARNING]
Quantized from mini-magnum-12b-v1.1 fp16
- Weighted quantizations were creating using fp16 GGUF and groupsmerged.txt in 92 chunks and nctx=512
- Static fp16 will also be included in repo
- For a brief rundown of iMatrix quant performance please see this PR
- All quants are verified working prior to uploading to repo for your safety and convenience
KL-Divergence Reference Chart
(Click on image to view in full size)
>Quant-specific Tips: >* If you are getting a[!TIP]
cudaMalloc failed: out of memory error, try passing an argument for lower context in llama.cpp, e.g. for 8k: -c 8192
>* If you have all ampere generation or newer cards, you can use flash attention like so: -fa
>* Provided Flash Attention is enabled you can also use quantized cache to save on VRAM e.g. for 8-bit: -ctk q80 -ctv q80
Original model card can be found here
π GGUF File List
| π Filename | π¦ Size | β‘ Download |
|---|---|---|
|
mini-magnum-12B-v1.1-F16.gguf
LFS
FP16
|
22.82 GB | Download |
|
mini-magnum-12b-v1.1-iMat-IQ2_M.gguf
LFS
Q2
|
4.13 GB | Download |
|
mini-magnum-12b-v1.1-iMat-IQ2_S.gguf
LFS
Q2
|
3.85 GB | Download |
|
mini-magnum-12b-v1.1-iMat-IQ3_M.gguf
LFS
Q3
|
5.33 GB | Download |
|
mini-magnum-12b-v1.1-iMat-IQ3_S.gguf
LFS
Q3
|
5.18 GB | Download |
|
mini-magnum-12b-v1.1-iMat-IQ3_XS.gguf
LFS
Q3
|
4.94 GB | Download |
|
mini-magnum-12b-v1.1-iMat-IQ3_XXS.gguf
LFS
Q3
|
4.61 GB | Download |
|
mini-magnum-12b-v1.1-iMat-IQ4_NL.gguf
LFS
Q4
|
6.61 GB | Download |
|
mini-magnum-12b-v1.1-iMat-IQ4_XS.gguf
LFS
Q4
|
6.28 GB | Download |
|
mini-magnum-12b-v1.1-iMat-Q2_K.gguf
LFS
Q2
|
4.46 GB | Download |
|
mini-magnum-12b-v1.1-iMat-Q3_K_L.gguf
LFS
Q3
|
6.11 GB | Download |
|
mini-magnum-12b-v1.1-iMat-Q3_K_M.gguf
LFS
Q3
|
5.67 GB | Download |
|
mini-magnum-12b-v1.1-iMat-Q3_K_S.gguf
LFS
Q3
|
5.15 GB | Download |
|
mini-magnum-12b-v1.1-iMat-Q4_K_M.gguf
Recommended
LFS
Q4
|
6.96 GB | Download |
|
mini-magnum-12b-v1.1-iMat-Q4_K_S.gguf
LFS
Q4
|
6.63 GB | Download |
|
mini-magnum-12b-v1.1-iMat-Q5_K_M.gguf
LFS
Q5
|
8.13 GB | Download |
|
mini-magnum-12b-v1.1-iMat-Q6_K.gguf
LFS
Q6
|
9.37 GB | Download |
|
mini-magnum-12b-v1.1-iMat-Q8_0.gguf
LFS
Q8
|
12.13 GB | Download |