Joseph717171/Llama-3.1-SuperNova-Lite-8.0B-OQ8_0-F32.EF32.IQ4_K-Q8_0-GGUF

Name: Joseph717171/Llama-3.1-SuperNova-Lite-8.0B-OQ8_0-F32.EF32.IQ4_K-Q8_0-GGUF
Author: Joseph717171

High-quality GGUF model

1.9K 📥 Downloads

2 ❤️ Likes

6 📁 GGUF Files

51.49 GB 💾 Total Size

11 months ago 🔄 Last Updated

📋 Model Description

Custom GGUF quants of arcee-ai/Llama-3.1-SuperNova-Lite, where the Output Tensors are quantized to Q80 while the Embeddings are kept at F32. Enjoy! 🧠🔥🚀

UPDATE: This repo now contains updated O.E.IQuants, which were quantized, using a new F32-imatrix, using llama.cpp version: 4067 (54ef9cfc). This particular version of llama.cpp made it so all KQ matmul computations were done in F32 vs BF16, when using FA (Flash Attention). This change, plus the other very impactful prior change, which made all KQ matmuls be computed with F32 (float32) precision for CUDA-Enabled devices, has compoundedly enhanced the O.E.IQuants and has made it excitingly necessary for this update to be pushed. Cheers!

📂 GGUF File List

📁 Filename	📦 Size	⚡ Download
Llama-3.1-SuperNova-Lite-8.0B-OF32.EF32.IQ4_K_M.gguf Recommended LFS Q4	7.82 GB	Download
Llama-3.1-SuperNova-Lite-8.0B-OF32.EF32.IQ6_K.gguf LFS Q6	9.25 GB	Download
Llama-3.1-SuperNova-Lite-8.0B-OF32.EF32.IQ8_0.gguf LFS Q8	10.83 GB	Download
Llama-3.1-SuperNova-Lite-8.0B-OQ8_0.EF32.IQ4_K_M.gguf LFS Q4	6.38 GB	Download
Llama-3.1-SuperNova-Lite-8.0B-OQ8_0.EF32.IQ6_K.gguf LFS Q6	7.82 GB	Download
Llama-3.1-SuperNova-Lite-8.0B-OQ8_0.EF32.IQ8_0.gguf LFS Q8	9.39 GB	Download

📊 Model Information

🆔 Model ID: Joseph717171/Llama-3.1-SuperNova-Lite-8.0B-OQ8_0-F32.EF32.IQ4_K-Q8_0-GGUF

📅 Created: 2 years ago

🔄 Last Updated: 11 months ago

📥 Downloads: 1.9K

❤️ Likes: 2

🎯 Difficulty: Advanced

⚙️ Quantization: Q4, Q6, Q8

🏷️ Tags

ggufendpoints_compatibleregion:usimatrixconversational

🔗 Related Links

🤗 Visit HuggingFace ⚡ Quick Download