InferenceIllusionist/mini-magnum-12b-v1.1-iMat-GGUF

Name: InferenceIllusionist/mini-magnum-12b-v1.1-iMat-GGUF
Author: InferenceIllusionist

High-quality GGUF model

2.3K 📥 Downloads

10 ❤️ Likes

18 📁 GGUF Files

128.36 GB 💾 Total Size

2 years ago 🔄 Last Updated

📋 Model Description

base_model: intervitens/mini-magnum-12b-v1.1 library_name: transformers quantized_by: InferenceIllusionist tags:

iMat
gguf
Mistral

license: apache-2.0

mini-magnum-12b-v1.1-iMat-GGUF

[!WARNING]

>Important Note: Inferencing in llama.cpp has now been merged in PR #8604. Please ensure you are on release b3438 or newer. Text-generation-web-ui (Ooba) is also working as of 7/23. Kobold.cpp working as of v1.71.

Quantized from mini-magnum-12b-v1.1 fp16

Weighted quantizations were creating using fp16 GGUF and groupsmerged.txt in 92 chunks and nctx=512
Static fp16 will also be included in repo
For a brief rundown of iMatrix quant performance please see this PR
All quants are verified working prior to uploading to repo for your safety and convenience

KL-Divergence Reference Chart
(Click on image to view in full size)

[!TIP]

>Quant-specific Tips: >* If you are getting a cudaMalloc failed: out of memory error, try passing an argument for lower context in llama.cpp, e.g. for 8k: -c 8192 >* If you have all ampere generation or newer cards, you can use flash attention like so: -fa >* Provided Flash Attention is enabled you can also use quantized cache to save on VRAM e.g. for 8-bit: -ctk q80 -ctv q80

Original model card can be found here

📂 GGUF File List

📁 Filename	📦 Size	⚡ Download
mini-magnum-12B-v1.1-F16.gguf LFS FP16	22.82 GB	Download
mini-magnum-12b-v1.1-iMat-IQ2_M.gguf LFS Q2	4.13 GB	Download
mini-magnum-12b-v1.1-iMat-IQ2_S.gguf LFS Q2	3.85 GB	Download
mini-magnum-12b-v1.1-iMat-IQ3_M.gguf LFS Q3	5.33 GB	Download
mini-magnum-12b-v1.1-iMat-IQ3_S.gguf LFS Q3	5.18 GB	Download
mini-magnum-12b-v1.1-iMat-IQ3_XS.gguf LFS Q3	4.94 GB	Download
mini-magnum-12b-v1.1-iMat-IQ3_XXS.gguf LFS Q3	4.61 GB	Download
mini-magnum-12b-v1.1-iMat-IQ4_NL.gguf LFS Q4	6.61 GB	Download
mini-magnum-12b-v1.1-iMat-IQ4_XS.gguf LFS Q4	6.28 GB	Download
mini-magnum-12b-v1.1-iMat-Q2_K.gguf LFS Q2	4.46 GB	Download
mini-magnum-12b-v1.1-iMat-Q3_K_L.gguf LFS Q3	6.11 GB	Download
mini-magnum-12b-v1.1-iMat-Q3_K_M.gguf LFS Q3	5.67 GB	Download
mini-magnum-12b-v1.1-iMat-Q3_K_S.gguf LFS Q3	5.15 GB	Download
mini-magnum-12b-v1.1-iMat-Q4_K_M.gguf Recommended LFS Q4	6.96 GB	Download
mini-magnum-12b-v1.1-iMat-Q4_K_S.gguf LFS Q4	6.63 GB	Download
mini-magnum-12b-v1.1-iMat-Q5_K_M.gguf LFS Q5	8.13 GB	Download
mini-magnum-12b-v1.1-iMat-Q6_K.gguf LFS Q6	9.37 GB	Download
mini-magnum-12b-v1.1-iMat-Q8_0.gguf LFS Q8	12.13 GB	Download

📊 Model Information

🆔 Model ID: InferenceIllusionist/mini-magnum-12b-v1.1-iMat-GGUF

📅 Created: 2 years ago

🔄 Last Updated: 2 years ago

📥 Downloads: 2.3K

❤️ Likes: 10

🎯 Difficulty: Advanced

⚙️ Quantization: FP16, Q2, Q3, Q4, Q5, Q6, Q8

🏷️ Tags

transformersggufiMatMistralbase_model:intervitens/mini-magnum-12b-v1.1base_model:quantized:intervitens/mini-magnum-12b-v1.1license:apache-2.0endpoints_compatibleregion:usconversational

🔗 Related Links

🤗 Visit HuggingFace ⚡ Quick Download