dam2452/Qwen3-VL-Embedding-8B-GGUF

Name: dam2452/Qwen3-VL-Embedding-8B-GGUF
Author: dam2452

High-quality GGUF model

2.0K 📥 Downloads

3 ❤️ Likes

9 📁 GGUF Files

52.33 GB 💾 Total Size

2 months ago 🔄 Last Updated

📋 Model Description

license: apache-2.0 base_model: Qwen/Qwen3-VL-Embedding-8B tags: - multimodal - embedding - gguf - llama.cpp - quantized library_name: llama.cpp language: - en - zh - multilingual

Qwen3-VL-Embedding-8B GGUF

GGUF quantizations of Qwen/Qwen3-VL-Embedding-8B for efficient CPU inference with llama.cpp.

Model Description

Qwen3-VL-Embedding-8B is a multimodal embedding model for information retrieval and cross-modal understanding. It supports text, images, screenshots, videos, and mixed multimodal inputs.

Original model specs:

Parameters: 8B
Context Length: 32K tokens
Embedding Dimension: 64-4096 (configurable)
Languages: 30+
Input Modalities: Text, Images, Videos

Available Quantizations

File	Size	Use Case
Qwen3-VL-Embedding-8B-F16.gguf	15GB	Maximum quality, baseline reference
Qwen3-VL-Embedding-8B-Q8_0.gguf	7.5GB	Recommended - minimal quality loss
Qwen3-VL-Embedding-8B-Q6_K.gguf	5.8GB	High quality, good balance
Qwen3-VL-Embedding-8B-Q5KM.gguf	5.1GB	Good quality, balanced size
Qwen3-VL-Embedding-8B-Q5KS.gguf	5.0GB	Good quality, smaller variant
Qwen3-VL-Embedding-8B-Q4KM.gguf	4.4GB	Decent quality, smaller size
Qwen3-VL-Embedding-8B-Q4KS.gguf	4.2GB	Decent quality, more compressed
Qwen3-VL-Embedding-8B-Q3KM.gguf	3.6GB	Lower quality, significant compression
Qwen3-VL-Embedding-8B-Q2_K.gguf	2.9GB	Lowest quality, maximum compression

Recommendation: Start with Q80 for production use. Use Q4KM or Q5K_M for resource-constrained environments.

Usage with llama.cpp

Installation

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make -j

Download Model

huggingface-cli download dam2452/Qwen3-VL-Embedding-8B-GGUF \
  Qwen3-VL-Embedding-8B-Q8_0.gguf \
  --local-dir ./models

Run Embedding Server

./llama-server \
  -m models/Qwen3-VL-Embedding-8B-Q8_0.gguf \
  --embedding \
  --port 8080 \
  --host 0.0.0.0

Generate Embeddings (API)

curl http://localhost:8080/embedding \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Your text or image data here"
  }'

Generate Embeddings (Python)

import requests

response = requests.post(
    "http://localhost:8080/embedding",
    json={"content": "A woman playing with her dog on a beach"}
)

embedding = response.json()["embedding"]
print(f"Embedding dimension: {len(embedding)}")

Performance

Original model performance on benchmarks:

MMEB-V2: 77.9 overall score
MMTEB: 67.88 mean task score
Retrieval: 81.08

Note: Quantized models may show slightly reduced performance, with Q8_0 typically having less than 1% degradation.

License

Apache 2.0 (inherited from original model)

Citation

@article{qwen3vlembedding,
  title={Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking},
  author={Li, Mingxin and Zhang, Yanzhao and Long, Dingkun and Chen Keqin and Song, Sibo and Bai, Shuai and Yang, Zhibo and Xie, Pengjun and Yang, An and Liu, Dayiheng and Zhou, Jingren and Lin, Junyang},
  journal={arXiv},
  year={2026}
}

Resources

📂 GGUF File List

📁 Filename	📦 Size	⚡ Download
Qwen3-VL-Embedding-8B-F16.gguf LFS FP16	14.1 GB	Download
Qwen3-VL-Embedding-8B-Q2_K.gguf LFS Q2	2.87 GB	Download
Qwen3-VL-Embedding-8B-Q3_K_M.gguf LFS Q3	3.59 GB	Download
Qwen3-VL-Embedding-8B-Q4_K_M.gguf Recommended LFS Q4	4.36 GB	Download
Qwen3-VL-Embedding-8B-Q4_K_S.gguf LFS Q4	4.15 GB	Download
Qwen3-VL-Embedding-8B-Q5_K_M.gguf LFS Q5	5.05 GB	Download
Qwen3-VL-Embedding-8B-Q5_K_S.gguf LFS Q5	4.93 GB	Download
Qwen3-VL-Embedding-8B-Q6_K.gguf LFS Q6	5.79 GB	Download
Qwen3-VL-Embedding-8B-Q8_0.gguf LFS Q8	7.5 GB	Download

📊 Model Information

🆔 Model ID: dam2452/Qwen3-VL-Embedding-8B-GGUF

📅 Created: 2 months ago

🔄 Last Updated: 2 months ago

📥 Downloads: 2.0K

❤️ Likes: 3

🎯 Difficulty: Advanced

⚙️ Quantization: FP16, Q2, Q3, Q4, Q5, Q6, Q8

🏷️ Tags

llama.cppggufmultimodalembeddingquantizedenzhmultilingualbase_model:Qwen/Qwen3-VL-Embedding-8Bbase_model:quantized:Qwen/Qwen3-VL-Embedding-8Blicense:apache-2.0endpoints_compatibleregion:usconversational

🔗 Related Links

🤗 Visit HuggingFace ⚡ Quick Download