📋 Model Description


base_model:
  • Tongyi-MAI/Z-Image-Turbo
tags:
  • text-to-image
  • image-generation
  • gguf
license: apache-2.0

Quantized GGUF versions of the Z-Image Turbo by Tongyi-Mai.

📂 Available Models





ModelDownload
Z-Image Turbo GGUFDownload
Qwen3-4B (Text Encoder)
unsloth/Qwen3-4B-GGUF

📷 Example Comparison

!zimagecomparison1 !zimagecomparison2 !zimagecomparison3

Model Information

Check out the original model card Z-Image Turbo for detailed information about the model.

Usage

The model can be used with:

#### Example Usage


Diffusers

pip install git+https://github.com/huggingface/diffusers
from diffusers import ZImagePipeline, ZImageTransformer2DModel, GGUFQuantizationConfig
import torch

prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (âšĄī¸), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (čĨŋåŽ‰å¤§é›åĄ”), blurred colorful distant lights."
height = 1024
width = 1024
seed = 42

#hfpath = "https://huggingface.co/jayn7/Z-Image-Turbo-GGUF/blob/main/zimageturbo-Q3K_M.gguf"
localpath = "path\to\local\model\zimageturbo-Q3K_M.gguf"

transformer = ZImageTransformer2DModel.fromsinglefile(
local_path,
quantizationconfig=GGUFQuantizationConfig(computedtype=torch.bfloat16),
dtype=torch.bfloat16,
)

pipeline = ZImagePipeline.from_pretrained(
"Tongyi-MAI/Z-Image-Turbo",
transformer=transformer,
dtype=torch.bfloat16,
).to("cuda")

[Optional] Attention Backend

Diffusers uses SDPA by default. Switch to Custom attention backend for better efficiency if supported:

#pipeline.transformer.setattentionbackend("sageqkint8pvfp16triton") # Enable Sage Attention #pipeline.transformer.setattentionbackend("flash") # Enable Flash-Attention-2 #pipeline.transformer.setattentionbackend("flash3") # Enable Flash-Attention-3

[Optional] Model Compilation

Compiling the DiT model accelerates inference, but the first run will take longer to compile.

#pipeline.transformer.compile()

[Optional] CPU Offloading

Enable CPU offloading for memory-constrained devices.

#pipeline.enablemodelcpu_offload()

images = pipeline(
prompt=prompt,
numinferencesteps=9, # This actually results in 8 DiT forwards
guidance_scale=0.0, # Guidance should be 0 for the Turbo models
height=height,
width=width,
generator=torch.Generator("cuda").manual_seed(seed)
).images[0]

images.save("zimage.png")

Credits

License

This repository follows the same license as the Z-Image Turbo.

📂 GGUF File List

📁 Filename đŸ“Ļ Size ⚡ Download
z_image_turbo-Q3_K_M.gguf
LFS Q3
3.84 GB Download
z_image_turbo-Q3_K_S.gguf
LFS Q3
3.53 GB Download
z_image_turbo-Q4_K_M.gguf
Recommended LFS Q4
4.64 GB Download
z_image_turbo-Q4_K_S.gguf
LFS Q4
4.34 GB Download
z_image_turbo-Q5_K_M.gguf
LFS Q5
5.14 GB Download
z_image_turbo-Q5_K_S.gguf
LFS Q5
4.83 GB Download
z_image_turbo-Q6_K.gguf
LFS Q6
5.5 GB Download
z_image_turbo-Q8_0.gguf
LFS Q8
6.73 GB Download