đ Model Description
base_model:
- Tongyi-MAI/Z-Image-Turbo
- text-to-image
- image-generation
- gguf
Quantized GGUF versions of the Z-Image Turbo by Tongyi-Mai.
đ Available Models
| Model | Download |
|---|---|
| Z-Image Turbo GGUF | Download |
| Qwen3-4B (Text Encoder) |
đˇ Example Comparison
!zimagecomparison1 !zimagecomparison2 !zimagecomparison3Model Information
Check out the original model card Z-Image Turbo for detailed information about the model.
Usage
The model can be used with:
- ComfyUI-GGUF by city96
- Diffusers
#### Example Usage
Diffusers
pip install git+https://github.com/huggingface/diffusers
from diffusers import ZImagePipeline, ZImageTransformer2DModel, GGUFQuantizationConfig
import torch
prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (âĄī¸), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (čĨŋåŽå¤§éåĄ), blurred colorful distant lights."
height = 1024
width = 1024
seed = 42
#hfpath = "https://huggingface.co/jayn7/Z-Image-Turbo-GGUF/blob/main/zimageturbo-Q3K_M.gguf"
localpath = "path\to\local\model\zimageturbo-Q3K_M.gguf"
transformer = ZImageTransformer2DModel.fromsinglefile(
local_path,
quantizationconfig=GGUFQuantizationConfig(computedtype=torch.bfloat16),
dtype=torch.bfloat16,
)
pipeline = ZImagePipeline.from_pretrained(
"Tongyi-MAI/Z-Image-Turbo",
transformer=transformer,
dtype=torch.bfloat16,
).to("cuda")
[Optional] Attention Backend
Diffusers uses SDPA by default. Switch to Custom attention backend for better efficiency if supported:
#pipeline.transformer.setattentionbackend("sageqkint8pvfp16triton") # Enable Sage Attention
#pipeline.transformer.setattentionbackend("flash") # Enable Flash-Attention-2
#pipeline.transformer.setattentionbackend("flash3") # Enable Flash-Attention-3
[Optional] Model Compilation
Compiling the DiT model accelerates inference, but the first run will take longer to compile.
#pipeline.transformer.compile()
[Optional] CPU Offloading
Enable CPU offloading for memory-constrained devices.
#pipeline.enablemodelcpu_offload()
images = pipeline(
prompt=prompt,
numinferencesteps=9, # This actually results in 8 DiT forwards
guidance_scale=0.0, # Guidance should be 0 for the Turbo models
height=height,
width=width,
generator=torch.Generator("cuda").manual_seed(seed)
).images[0]
images.save("zimage.png")
Credits
- Original Model: Z-Image Turbo by Tongyi-MAI
- Quantization Tools & Guide: llama.cpp & city96
License
This repository follows the same license as the Z-Image Turbo.đ GGUF File List
| đ Filename | đĻ Size | ⥠Download |
|---|---|---|
|
z_image_turbo-Q3_K_M.gguf
LFS
Q3
|
3.84 GB | Download |
|
z_image_turbo-Q3_K_S.gguf
LFS
Q3
|
3.53 GB | Download |
|
z_image_turbo-Q4_K_M.gguf
Recommended
LFS
Q4
|
4.64 GB | Download |
|
z_image_turbo-Q4_K_S.gguf
LFS
Q4
|
4.34 GB | Download |
|
z_image_turbo-Q5_K_M.gguf
LFS
Q5
|
5.14 GB | Download |
|
z_image_turbo-Q5_K_S.gguf
LFS
Q5
|
4.83 GB | Download |
|
z_image_turbo-Q6_K.gguf
LFS
Q6
|
5.5 GB | Download |
|
z_image_turbo-Q8_0.gguf
LFS
Q8
|
6.73 GB | Download |