π Model Description
Custom GGUF quants of arcee-ai/Llama-3.1-SuperNova-Lite, where the Output Tensors are quantized to Q80 while the Embeddings are kept at F32. Enjoy! π§ π₯π
UPDATE: This repo now contains updated O.E.IQuants, which were quantized, using a new F32-imatrix, using llama.cpp version: 4067 (54ef9cfc). This particular version of llama.cpp made it so all KQ matmul computations were done in F32 vs BF16, when using FA (Flash Attention). This change, plus the other very impactful prior change, which made all KQ matmuls be computed with F32 (float32) precision for CUDA-Enabled devices, has compoundedly enhanced the O.E.IQuants and has made it excitingly necessary for this update to be pushed. Cheers!
π GGUF File List
| π Filename | π¦ Size | β‘ Download |
|---|---|---|
|
Llama-3.1-SuperNova-Lite-8.0B-OF32.EF32.IQ4_K_M.gguf
Recommended
LFS
Q4
|
7.82 GB | Download |
|
Llama-3.1-SuperNova-Lite-8.0B-OF32.EF32.IQ6_K.gguf
LFS
Q6
|
9.25 GB | Download |
|
Llama-3.1-SuperNova-Lite-8.0B-OF32.EF32.IQ8_0.gguf
LFS
Q8
|
10.83 GB | Download |
|
Llama-3.1-SuperNova-Lite-8.0B-OQ8_0.EF32.IQ4_K_M.gguf
LFS
Q4
|
6.38 GB | Download |
|
Llama-3.1-SuperNova-Lite-8.0B-OQ8_0.EF32.IQ6_K.gguf
LFS
Q6
|
7.82 GB | Download |
|
Llama-3.1-SuperNova-Lite-8.0B-OQ8_0.EF32.IQ8_0.gguf
LFS
Q8
|
9.39 GB | Download |