BennyDaBall/Qwen3-4b-Z-Image-Engineer-V2.5

Name: BennyDaBall/Qwen3-4b-Z-Image-Engineer-V2.5
Author: BennyDaBall

High-quality GGUF model

5.4K 📥 Downloads

52 ❤️ Likes

6 📁 GGUF Files

21.8 GB 💾 Total Size

3 weeks ago 🔄 Last Updated

📋 Model Description

license: apache-2.0 base_model: Tongyi-MAI/Z-Image-Turbo tags:

image-generation
prompt-engineering
qwen
photography

new_version: BennyDaBall/Qwen3-4b-Z-Image-Engineer-V4

🚀 Z-Engineer V2.5 (4B)

The "Z-Engineer" is back — longer, deeper, and smarter.

This is Z-Engineer V2.5, a specialized 4B parameter model fine-tuned on the Qwen 3 architecture. It serves as a dedicated Creative Director for your image generation workflow, capable of extrapolating complex, cohesive visual narratives from minimal seed concepts. It doesn't just describe a scene; it engineers the light, lens, and atmosphere necessary to render it.

🧠 What is this?

Z-Engineer V2.5 is a merged LoRA fine-tuned version of high-performance text encoder from Tongyi-MAI/Z-Image-Turbo. It has been trained to specifically understand the nuances of AI Image Generation (Z-Image-Turbo, Flux2 Klein). It excels at:

Expanding Concepts: Turn "dog on a bike" into a cinematic narrative.
Technical Precision: It understands lenses (35mm vs 85mm), lighting (rembrandt, volumetric), and film stocks.
Stylistic Consistency: It avoids the robotic "AI feel" and writes with a distinct, creative voice.

🔑 Key Use Cases

✨ Prompt Enhancement: A lightweight, low-VRAM solution to create, edit, and enrich simple image ideas into detailed narratives.
🔌 Z-Image Turbo Encoder: Fully backwards compatible as a drop-in CLIP text encoder for Z-Image Turbo workflows, producing varied and unique results from the same seed.
🛡️ Local & Private: Runs entirely on your machine. No API fees, no data logging, no censorship.
⚡ Hybrid Power: Use it to expand a prompt, then use the model itself as the encoder for the generation stage.

📉 Key Improvements

Base Model Upgrade: Switched from standard Qwen3 Instruct to the native text encoder from Z-Image-Turbo for perfect alignment.
All-Layer Training: Unlike typical lightweight LoRAs, I trained adapters on all 36 layers of the model, ensuring deep behavioral alignment.
Massive Iteration Count: Trained for 10,000 iterations to fully saturate the weights with the dataset concepts.

📊 CLIP Model Comparison

Z-Engineer V2.5 can be used as a drop-in CLIP text encoder for Z-Image-Turbo workflows. Here's how it compares to previous versions and the base model:

Model	Result
Z-Engineer V2.5	✅ Clean, natural output with excellent detail and coherence.
Z-Engineer V2	✅ Good quality, but V2.5 shows improved texture and lighting.
Z-Engineer V1	❌ Broken: Produces severe visual artifacts and distortions.
Base Qwen3 4B	⚠️ Functional but generic; lacks the specialized prompt understanding.

Visual Comparison

!CLIP Comparison 1

!CLIP Comparison 2

Note: V1 exhibits catastrophic artifacts (bottom-left in each grid) due to training instabilities. V2.5 (top-left) consistently produces the cleanest, most natural results.

🔌 ComfyUI Integration (Recommended)

I have released a custom node for seamless integration with ComfyUI!

Features: Optimized for local OpenAI API compatible backends (LM Studio, Ollama, etc.).
Get it here: ComfyUI-Z-Engineer

💻 Training Facts

I believe in open science. Here is exactly how this was built:

Hardware: Trained locally on a Mac with 48GB Unified Memory (Apple Silicon).
Framework: MLX (Apple's native machine learning framework).
Dataset: Generated locally using Qwen3 VL 30B A3B Instruct

* Size: ~34,678 high-quality examples. * Content: A curated mix of "Prompt Enhancement" pairs, teaching the model how to take a seed idea and "engineer" it into a final prompt.

Hyperparameters:

* Iterations: 10,000 * Batch Size: 4 * LoRA Layers: 36 (All Linear Layers) * Learning Rate: 1e-5

📦 GGUF & Quantization

I provide a full suite of GGUF quantizations for use with llama.cpp, Ollama, and LM Studio.

Quantization	Size	Use Case
Q4KS	2.2 GB	🔻 Max Compression
Q4KM	2.3 GB	⚡️ Fast / Mobile / Edge
Q5KM	2.7 GB	⚖️ Recommended Balance
Q6_K	3.1 GB	💎 High Quality
Q8_0	4.0 GB	🎬 Near-Lossless
F16	7.5 GB	🧪 Reference / Conversion

⚠️ Disclaimer

This model generates text for image prompts. While I have filtered the dataset, users should use their best judgment. I am not responsible for the content you generate.

📂 GGUF File List

📁 Filename	📦 Size	⚡ Download
Z-Engineer-2.5-Q4_K_M.gguf Recommended LFS Q4	2.33 GB	Download
Z-Engineer-2.5-Q4_K_S.gguf LFS Q4	2.22 GB	Download
Z-Engineer-2.5-Q5_K_M.gguf LFS Q5	2.69 GB	Download
Z-Engineer-2.5-Q6_K.gguf LFS Q6	3.08 GB	Download
Z-Engineer-2.5-Q8_0.gguf LFS Q8	3.99 GB	Download
Z-Engineer-2.5.gguf LFS	7.5 GB	Download

📊 Model Information

🆔 Model ID: BennyDaBall/Qwen3-4b-Z-Image-Engineer-V2.5

📅 Created: 2 months ago

🔄 Last Updated: 3 weeks ago

📥 Downloads: 5.4K

❤️ Likes: 52

🎯 Difficulty: Intermediate

⚙️ Quantization: Q4, Q5, Q6, Q8

🏷️ Tags

safetensorsggufimage-generationprompt-engineeringqwenphotographybase_model:Tongyi-MAI/Z-Image-Turbobase_model:quantized:Tongyi-MAI/Z-Image-Turbolicense:apache-2.0endpoints_compatibleregion:usconversational

🔗 Related Links

🤗 Visit HuggingFace ⚡ Quick Download