πŸ“‹ Model Description


base_model: intervitens/mini-magnum-12b-v1.1 library_name: transformers quantized_by: InferenceIllusionist tags:
  • iMat
  • gguf
  • Mistral
license: apache-2.0

mini-magnum-12b-v1.1-iMat-GGUF

[!WARNING]

>Important Note: Inferencing in llama.cpp has now been merged in PR #8604. Please ensure you are on release b3438 or newer. Text-generation-web-ui (Ooba) is also working as of 7/23. Kobold.cpp working as of v1.71.

Quantized from mini-magnum-12b-v1.1 fp16

  • Weighted quantizations were creating using fp16 GGUF and groupsmerged.txt in 92 chunks and nctx=512
  • Static fp16 will also be included in repo
  • For a brief rundown of iMatrix quant performance please see this PR
  • All quants are verified working prior to uploading to repo for your safety and convenience

KL-Divergence Reference Chart
(Click on image to view in full size)

[!TIP]

>Quant-specific Tips: >* If you are getting a cudaMalloc failed: out of memory error, try passing an argument for lower context in llama.cpp, e.g. for 8k: -c 8192 >* If you have all ampere generation or newer cards, you can use flash attention like so: -fa >* Provided Flash Attention is enabled you can also use quantized cache to save on VRAM e.g. for 8-bit: -ctk q80 -ctv q80

Original model card can be found here

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
mini-magnum-12B-v1.1-F16.gguf
LFS FP16
22.82 GB Download
mini-magnum-12b-v1.1-iMat-IQ2_M.gguf
LFS Q2
4.13 GB Download
mini-magnum-12b-v1.1-iMat-IQ2_S.gguf
LFS Q2
3.85 GB Download
mini-magnum-12b-v1.1-iMat-IQ3_M.gguf
LFS Q3
5.33 GB Download
mini-magnum-12b-v1.1-iMat-IQ3_S.gguf
LFS Q3
5.18 GB Download
mini-magnum-12b-v1.1-iMat-IQ3_XS.gguf
LFS Q3
4.94 GB Download
mini-magnum-12b-v1.1-iMat-IQ3_XXS.gguf
LFS Q3
4.61 GB Download
mini-magnum-12b-v1.1-iMat-IQ4_NL.gguf
LFS Q4
6.61 GB Download
mini-magnum-12b-v1.1-iMat-IQ4_XS.gguf
LFS Q4
6.28 GB Download
mini-magnum-12b-v1.1-iMat-Q2_K.gguf
LFS Q2
4.46 GB Download
mini-magnum-12b-v1.1-iMat-Q3_K_L.gguf
LFS Q3
6.11 GB Download
mini-magnum-12b-v1.1-iMat-Q3_K_M.gguf
LFS Q3
5.67 GB Download
mini-magnum-12b-v1.1-iMat-Q3_K_S.gguf
LFS Q3
5.15 GB Download
mini-magnum-12b-v1.1-iMat-Q4_K_M.gguf
Recommended LFS Q4
6.96 GB Download
mini-magnum-12b-v1.1-iMat-Q4_K_S.gguf
LFS Q4
6.63 GB Download
mini-magnum-12b-v1.1-iMat-Q5_K_M.gguf
LFS Q5
8.13 GB Download
mini-magnum-12b-v1.1-iMat-Q6_K.gguf
LFS Q6
9.37 GB Download
mini-magnum-12b-v1.1-iMat-Q8_0.gguf
LFS Q8
12.13 GB Download