πŸ“‹ Model Description


license: apache-2.0 base_model: Tongyi-MAI/Z-Image-Turbo tags:
  • image-generation
  • prompt-engineering
  • qwen
  • photography
new_version: BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4

πŸš€ Z-Engineer V2.5 (4B)

The "Z-Engineer" is back β€” longer, deeper, and smarter.

This is Z-Engineer V2.5, a specialized 4B parameter model fine-tuned on the Qwen 3 architecture. It serves as a dedicated Creative Director for your image generation workflow, capable of extrapolating complex, cohesive visual narratives from minimal seed concepts. It doesn't just describe a scene; it engineers the light, lens, and atmosphere necessary to render it.

🧠 What is this?

Z-Engineer V2.5 is a merged LoRA fine-tuned version of high-performance text encoder from Tongyi-MAI/Z-Image-Turbo. It has been trained to specifically understand the nuances of AI Image Generation (Z-Image-Turbo, Flux2 Klein). It excels at:
  • Expanding Concepts: Turn "dog on a bike" into a cinematic narrative.
  • Technical Precision: It understands lenses (35mm vs 85mm), lighting (rembrandt, volumetric), and film stocks.
  • Stylistic Consistency: It avoids the robotic "AI feel" and writes with a distinct, creative voice.

πŸ”‘ Key Use Cases

  • ✨ Prompt Enhancement: A lightweight, low-VRAM solution to create, edit, and enrich simple image ideas into detailed narratives.
  • πŸ”Œ Z-Image Turbo Encoder: Fully backwards compatible as a drop-in CLIP text encoder for Z-Image Turbo workflows, producing varied and unique results from the same seed.
  • πŸ›‘οΈ Local & Private: Runs entirely on your machine. No API fees, no data logging, no censorship.
  • ⚑ Hybrid Power: Use it to expand a prompt, then use the model itself as the encoder for the generation stage.

πŸ“‰ Key Improvements

  • Base Model Upgrade: Switched from standard Qwen3 Instruct to the native text encoder from Z-Image-Turbo for perfect alignment.
  • All-Layer Training: Unlike typical lightweight LoRAs, I trained adapters on all 36 layers of the model, ensuring deep behavioral alignment.
  • Massive Iteration Count: Trained for 10,000 iterations to fully saturate the weights with the dataset concepts.

πŸ“Š CLIP Model Comparison

Z-Engineer V2.5 can be used as a drop-in CLIP text encoder for Z-Image-Turbo workflows. Here's how it compares to previous versions and the base model:
ModelResult
Z-Engineer V2.5βœ… Clean, natural output with excellent detail and coherence.
Z-Engineer V2βœ… Good quality, but V2.5 shows improved texture and lighting.
Z-Engineer V1❌ Broken: Produces severe visual artifacts and distortions.
Base Qwen3 4B⚠️ Functional but generic; lacks the specialized prompt understanding.

Visual Comparison

!CLIP Comparison 1

!CLIP Comparison 2

Note: V1 exhibits catastrophic artifacts (bottom-left in each grid) due to training instabilities. V2.5 (top-left) consistently produces the cleanest, most natural results.

πŸ”Œ ComfyUI Integration (Recommended)

I have released a custom node for seamless integration with ComfyUI!
  • Features: Optimized for local OpenAI API compatible backends (LM Studio, Ollama, etc.).
  • Get it here: ComfyUI-Z-Engineer

πŸ’» Training Facts

I believe in open science. Here is exactly how this was built:
  • Hardware: Trained locally on a Mac with 48GB Unified Memory (Apple Silicon).
  • Framework: MLX (Apple's native machine learning framework).
  • Dataset: Generated locally using Qwen3 VL 30B A3B Instruct
* Size: ~34,678 high-quality examples. * Content: A curated mix of "Prompt Enhancement" pairs, teaching the model how to take a seed idea and "engineer" it into a final prompt.
  • Hyperparameters:
* Iterations: 10,000 * Batch Size: 4 * LoRA Layers: 36 (All Linear Layers) * Learning Rate: 1e-5

πŸ“¦ GGUF & Quantization

I provide a full suite of GGUF quantizations for use with llama.cpp, Ollama, and LM Studio.
QuantizationSizeUse Case
Q4KS2.2 GBπŸ”» Max Compression
Q4KM2.3 GB⚑️ Fast / Mobile / Edge
Q5KM2.7 GBβš–οΈ Recommended Balance
Q6_K3.1 GBπŸ’Ž High Quality
Q8_04.0 GB🎬 Near-Lossless
F167.5 GBπŸ§ͺ Reference / Conversion

⚠️ Disclaimer

This model generates text for image prompts. While I have filtered the dataset, users should use their best judgment. I am not responsible for the content you generate.

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
Z-Engineer-2.5-Q4_K_M.gguf
Recommended LFS Q4
2.33 GB Download
Z-Engineer-2.5-Q4_K_S.gguf
LFS Q4
2.22 GB Download
Z-Engineer-2.5-Q5_K_M.gguf
LFS Q5
2.69 GB Download
Z-Engineer-2.5-Q6_K.gguf
LFS Q6
3.08 GB Download
Z-Engineer-2.5-Q8_0.gguf
LFS Q8
3.99 GB Download
Z-Engineer-2.5.gguf
LFS
7.5 GB Download