π Model Description
base_model:
- black-forest-labs/FLUX.1-dev
- gguf
- flux
- text-to-image
- imatrix
Supported?
Expect broken or faulty items for the time being. Use at your own discretion.
- ComfyUI-GGUF: all? (CPU/CUDA)
- Forge: TBC
- stable-diffusion.cpp: llama.cpp Feature-matrix
Disco
Dynamic quantization:
- timein.inlayer: Q80/Q6K
- finallayer, vectorin.inlayer, guidancein: Q80
- vectorin.outlayer, timein.outlayer, txtin, imgin: F16
- singleblocks.[> 10 && < 37].modulation.lin: one down?
| Filename | Quant type | File Size | Description / L2 Loss Step 25 | Example Image |
|---|---|---|---|---|
Caesar
Combined imatrix multiple images 512x512 and 768x768, 25, 30 and 50 steps city96/flux1-dev-Q8_0 eulerdata: loadimatrix: loaded 314 importance matrix entries from imatrixcaesar.dat computed on 475 chunks
Using llama.cpp quantize cae9fb4 with modified lcpp.patch.
Dynamic quantization:
- imgin, guidancein.inlayer, finallayer.linear: f32/bf16/f16
- guidancein, finallayer: bf16/f16
- imgattn.qkv, linear1: some layers two bits up
- txtmod.lin, txtmlp, txtattn.proj: some layers one bit down
Experimental from f16
| Filename | Bits img_attn.qkv & linear1 |
|---|---|
| flux1-dev-IQ1_S.gguf | 333M MMMM M111 ... 11MM MM11 |
| flux1-dev-TQ1_0.gguf | 3332 2222 2111 ... 1122 2211 |
| flux1-dev-IQ1_M.gguf | 3332 2222 2111 ... 1122 2211 |
| flux1-dev-IQ2_XXS.gguf | 4433 3333 3222 ... 2222 |
| flux1-dev-TQ2_0.gguf | 3332 2222 2111 ... 1122 2211 |
| flux1-dev-IQ2_XS.gguf | 4443 3333 3222 ... 2233 3322 |
| flux1-dev-IQ2_S.gguf | 4444 4444 4444 4444 4433 3222 ... 2233 3322 |
| flux1-dev-IQ2_M.gguf | 4444 4444 4444 4444 4433 3222 ... 2223 3333 3322 |
| flux1-dev-Q2K_S.gguf | 4443 3333 3222 ... 2222 |
| flux1-dev-Q2_K.gguf | 4443 3333 3222 ... 2233 3322 |
| flux1-dev-IQ3_XXS.gguf | 444S SSSS S333 ... 3333 |
| flux1-dev-IQ3_XS.gguf | 444S SSSS S333 ... 33SS SS33 |
| flux1-dev-Q3K_S.gguf | 5554 4444 4333 ... 3333 |
| flux1-dev-IQ3_S.gguf | 5554 4444 4333 ... 3344 4433 |
| flux1-dev-Q3K_M.gguf | 5554 4444 4333 ... 3344 4433 |
| flux1-dev-IQ3_M.gguf | 5554 4444 4444 4444 4433 ... 3344 4433 |
| flux1-dev-Q3K_L.gguf | 5554 4444 4444 4444 4433 ... 3344 4433 |
| flux1-dev-IQ4_XS.gguf | 8885 5555 5444 ... 4444 |
| flux1-dev-Q4K_S.gguf | 8885 5555 5444 ... 4444 |
| flux1-dev-Q4K_M.gguf | 8885 5555 5555 5555 5544 ... 4444 |
| flux1-dev-IQ4_NL.gguf | 8885 5555 5555 5555 5544 ... 4444 |
| flux1-dev-Q4_0.gguf | 8885 5555 5444 ... 4444 |
| flux1-dev-Q4_1.gguf | 8885 5555 5444 ... 4444 |
| flux1-dev-Q5K_S.gguf | FFF6 6666 6666 6666 6655 ... 5555 |
| flux1-dev-Q5_0.gguf | FFF8 8888 8555 ... 5555 |
| flux1-dev-Q5K_M.gguf | FFF8 8888 8666 6666 6655 ... 5555 |
| flux1-dev-Q5_1.gguf | FFF8 8888 8555 ... 5555 |
| flux1-dev-Q6_K.gguf | FFF8 8888 8666 .. 6666 |
| flux1-dev-Q8_0.gguf | FFF8 8888 .. 8888 |
Observations
- More imatrix data doesn't necessarily result in better quants
- I-quants worse than same bits k-quants?
- Quant-dequant loss
Bravo
Combined imatrix multiple images 512x512 25 and 50 steps city96/flux1-dev-Q8_0 eulerUsing llama.cpp quantize cae9fb4 with modified lcpp.patch.
Experimental from f16
| Filename | Quant type | File Size | Description / L2 Loss Step 25 | Example Image |
|---|---|---|---|---|
| flux1-dev-IQ1S.gguf | IQ1S | 2.45GB | worst / 156 | Example |
| flux1-dev-IQ1M.gguf | IQ1M | 2.72GB | worst / 141 | Example |
| flux1-dev-IQ2XXS.gguf | IQ2XXS | 3.19GB | worst / 131 | Example |
| flux1-dev-IQ2XS.gguf | IQ2XS | 3.56GB | worst / 125 | - |
| flux1-dev-IQ2S.gguf | IQ2S | 3.56GB | worst / 125 | - |
| flux1-dev-IQ2M.gguf | IQ2M | 3.93GB | worst / 120 | Example |
| flux1-dev-Q2KS.gguf | Q2KS | 4.02GB | ok / 56 | Example |
| flux1-dev-IQ3XXS.gguf | IQ3XXS | 4.66GB | TBC / 68 | Example |
| flux1-dev-IQ3XS.gguf | IQ3XS | 5.22GB | worse than IQ3XXS / 115 | Example |
| flux1-dev-IQ3S.gguf | IQ3S | TBC | TBC | - |
| flux1-dev-IQ3M.gguf | IQ3M | TBC | TBC | - |
| flux1-dev-Q3KS.gguf | Q3KS | 5.22GB | TBC / 34 | Example |
| flux1-dev-IQ4XS.gguf | IQ4XS | 6.42GB | TBC / 25 | - |
| flux1-dev-Q40.gguf | Q40 | 6.79GB | TBC / 31 | - |
| flux1-dev-IQ4NL.gguf | IQ4NL | 6.79GB | TBC / 21 | Example |
| flux1-dev-Q4KS.gguf | Q4KS | 6.79GB | TBC / 29 | Example |
| flux1-dev-Q41.gguf | Q41 | 7.53GB | TBC / 24 | - |
| flux1-dev-Q50.gguf | Q50 | 8.27GB | TBC / 25 | - |
| flux1-dev-Q51.gguf | Q51 | TBC | TBC / 24 | - |
| flux1-dev-Q5KS.gguf | Q5KS | 8.27GB | TBC / 20 | Example |
| flux1-dev-Q6K.gguf | Q6K | 9.84GB | TBC / 19 | Example |
| flux1-dev-Q80.gguf | Q80 | - | TBC / 10 | - |
| - | F16 | 23.8GB | reference | Example |
Observations
- Bravo IQ1S worse than Alpha?
- Latent loss
- Per layer quantization cost from chrisgoringe/castingcost
- Per layer quantization cost 2 from Freepik/flux.1-lite-8B: double blocks and single blocks
- Ablation latent loss per weight type
- Pareto front loss vs. size
Alpha
Simple imatrix: 512x512 single image 8/20 steps city96/flux1-dev-Q3K_S eulerdata: load_imatrix: loaded 314 importance matrix entries from imatrix.dat computed on 7 chunks.
Using llama.cpp quantize cae9fb4 with modified lcpp.patch.
Experimental from q8
| Filename | Quant type | File Size | Description / L2 Loss Step 25 | Example Image |
|---|---|---|---|---|
| flux1-dev-IQ1S.gguf | IQ1S | 2.45GB | worst / 152 | Example |
| - | IQ1_M | - | broken | - |
| flux1-dev-TQ10.gguf | TQ10 | 2.63GB | TBC / 220 | - |
| flux1-dev-TQ20.gguf | TQ20 | 3.19GB | TBC / 220 | - |
| flux1-dev-IQ2XXS.gguf | IQ2XXS | 3.19GB | worst / 130 | Example |
| flux1-dev-IQ2XS.gguf | IQ2XS | 3.56GB | worst / 129 | Example |
| flux1-dev-IQ2S.gguf | IQ2S | 3.56GB | worst / 129 | - |
| flux1-dev-IQ2M.gguf | IQ2M | 3.93GB | worst / 121 | - |
| flux1-dev-Q2K.gguf | Q2K | 4.02GB | TBC / 77 | - |
| flux1-dev-Q2KS.gguf | Q2KS | 4.02GB | ok / 77 | Example |
| flux1-dev-IQ3XXS.gguf | IQ3XXS | 4.66GB | TBC / 130 | Example |
| flux1-dev-IQ3XS.gguf | IQ3XS | 5.22GB | TBC / 114 | - |
| flux1-dev-IQ3S.gguf | IQ3S | 5.22GB | TBC / 114 | - |
| flux1-dev-IQ3M.gguf | IQ3M | 5.22GB | TBC / 114 | - |
| flux1-dev-Q3KS.gguf | Q3KS | 5.22GB | TBC / 36 | Example |
| flux1-dev-Q3KM.gguf | Q3K_M | 5.36GB | TBC / 42 | - |
| flux1-dev-Q3KL.gguf | Q3K_L | 5.36GB | TBC / 42 | - |
| flux1-dev-IQ4XS.gguf | IQ4XS | 6.42GB | TBC / 30 | Example |
| flux1-dev-IQ4NL.gguf | IQ4NL | 6.79GB | TBC / 23 | Example |
| flux1-dev-Q40.gguf | Q40 | 6.79GB | TBC / 27 | - |
| - | Q4_K | TBC | TBC / 27 | - |
| flux1-dev-Q4KS.gguf | Q4KS | 6.79GB | TBC / 26 | Example |
| flux1-dev-Q4KM.gguf | Q4K_M | 6.93GB | TBC / 27 | - |
| flux1-dev-Q41.gguf | Q41 | 7.53GB | TBC / 23 | - |
| flux1-dev-Q5KS.gguf | Q5KS | 8.27GB | TBC / 19 | Example |
| flux1-dev-Q5K.gguf | Q5K | 8.41GB | TBC / 20 | - |
| - | Q5KM | TBC | TBC | - |
| flux1-dev-Q6K.gguf | Q6K | 9.84GB | TBC / 22 | - |
| - | Q80 | 12.7GB | near perfect / 10 | Example |
| - | F16 | 23.8GB | reference | Example |
Observations
Sub-quants not diferentiated as expected: IQ2XS == IQ2S, IQ3XS == IQ3S == IQ3M, Q3KM == Q3K_L.
- Check if lcpp_sd3.patch includes more specific quant level logic
- Extrapolate the existing level logic