πŸ“‹ Model Description

Custom GGUF quants of arcee-ai/Llama-3.1-SuperNova-Lite, where the Output Tensors are quantized to Q80 while the Embeddings are kept at F32. Enjoy! 🧠πŸ”₯πŸš€

UPDATE: This repo now contains updated O.E.IQuants, which were quantized, using a new F32-imatrix, using llama.cpp version: 4067 (54ef9cfc). This particular version of llama.cpp made it so all KQ matmul computations were done in F32 vs BF16, when using FA (Flash Attention). This change, plus the other very impactful prior change, which made all KQ matmuls be computed with F32 (float32) precision for CUDA-Enabled devices, has compoundedly enhanced the O.E.IQuants and has made it excitingly necessary for this update to be pushed. Cheers!

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
Llama-3.1-SuperNova-Lite-8.0B-OF32.EF32.IQ4_K_M.gguf
Recommended LFS Q4
7.82 GB Download
Llama-3.1-SuperNova-Lite-8.0B-OF32.EF32.IQ6_K.gguf
LFS Q6
9.25 GB Download
Llama-3.1-SuperNova-Lite-8.0B-OF32.EF32.IQ8_0.gguf
LFS Q8
10.83 GB Download
Llama-3.1-SuperNova-Lite-8.0B-OQ8_0.EF32.IQ4_K_M.gguf
LFS Q4
6.38 GB Download
Llama-3.1-SuperNova-Lite-8.0B-OQ8_0.EF32.IQ6_K.gguf
LFS Q6
7.82 GB Download
Llama-3.1-SuperNova-Lite-8.0B-OQ8_0.EF32.IQ8_0.gguf
LFS Q8
9.39 GB Download