jinaai/jina-embeddings-v5-text-small-retrieval-GGUF

Name: jinaai/jina-embeddings-v5-text-small-retrieval-GGUF
Author: jinaai

High-quality GGUF model

4.8K 📥 Downloads

3 ❤️ Likes

14 📁 GGUF Files

5.51 GB 💾 Total Size

6 days ago 🔄 Last Updated

📋 Model Description

pipeline_tag: sentence-similarity tags:

gguf
embedding
qwen3
llama-cpp
jina-embeddings-v5

language:

multilingual

base_model: jinaai/jina-embeddings-v5-text-small basemodelrelation: quantized inference: false license: cc-by-nc-4.0 library_name: llama.cpp

jina-embeddings-v5-text-small-retrieval-GGUF

GGUF quantizations of jina-embeddings-v5-text-small-retrieval using llama.cpp. A 677M parameter multilingual embedding model quantized for efficient inference.

Elastic Inference Service | ArXiv | Blog

[!IMPORTANT]
We highly recommend to first read this blog post for more technical details and customized llama.cpp build.

Overview

jina-embeddings-v5-text Architecture

jina-embeddings-v5-text-small-retrieval is a task-specific embedding model for retrieval, part of the jina-embeddings-v5-text model family.

Feature	Value
Parameters	677M
Task	`retrieval`
Embedding Dimension	1024
Matryoshka Dimensions	32, 64, 128, 256, 512, 768, 1024
Pooling Strategy	Last-token pooling
Base Model	jina-embeddings-v5-text-small

MMTEB Multilingual Benchmark

MTEB English Benchmark

Retrieval Benchmark Results

Usage with llama.cpp

via Elastic Inference Service

The fastest way to use v5-text in production. Elastic Inference Service (EIS) provides managed embedding inference with built-in scaling, so you can generate embeddings directly within your Elastic deployment.

PUT inference/textembedding/jina-v5
{
  "service": "elastic",
  "service_settings": {
    "model_id": "jina-embeddings-v5-text-small"
  }
}

See the Elastic Inference Service documentation for setup details.

# Build llama.cpp (upstream)
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && cmake -B build && cmake --build build --config Release

Run embedding
./build/bin/llama-embedding -m jina-embeddings-v5-text-small-retrieval-Q8_0.gguf \
  --pooling last -p "Your text here"

License

CC-BY-NC-4.0. For commercial use, please contact us.

📂 GGUF File List

📁 Filename	📦 Size	⚡ Download
v5-small-retrieval-F16.gguf LFS FP16	1.12 GB	Download
v5-small-retrieval-IQ1_M.gguf LFS	206.04 MB	Download
v5-small-retrieval-IQ1_S.gguf LFS	198.38 MB	Download
v5-small-retrieval-IQ2_M.gguf LFS Q2	252.64 MB	Download
v5-small-retrieval-IQ2_XXS.gguf LFS Q2	218.82 MB	Download
v5-small-retrieval-IQ4_NL.gguf LFS Q4	363.89 MB	Download
v5-small-retrieval-IQ4_XS.gguf LFS Q4	350.76 MB	Download
v5-small-retrieval-Q2_K.gguf LFS Q2	282.51 MB	Download
v5-small-retrieval-Q3_K_M.gguf LFS Q3	331.05 MB	Download
v5-small-retrieval-Q4_K_M.gguf Recommended LFS Q4	378.33 MB	Download
v5-small-retrieval-Q5_K_M.gguf LFS Q5	423.83 MB	Download
v5-small-retrieval-Q5_K_S.gguf LFS Q5	416.39 MB	Download
v5-small-retrieval-Q6_K.gguf LFS Q6	472.17 MB	Download
v5-small-retrieval-Q8_0.gguf LFS Q8	609.82 MB	Download

📊 Model Information

🆔 Model ID: jinaai/jina-embeddings-v5-text-small-retrieval-GGUF

📅 Created: 1 weeks ago

🔄 Last Updated: 6 days ago

📥 Downloads: 4.8K

❤️ Likes: 3

🎯 Difficulty: Beginner

⚙️ Quantization: FP16, Q2, Q4, Q3, Q5, Q6, Q8

🏷️ Tags

llama.cppggufembeddingqwen3llama-cppjina-embeddings-v5sentence-similaritymultilingualarxiv:2602.15547base_model:jinaai/jina-embeddings-v5-text-smallbase_model:quantized:jinaai/jina-embeddings-v5-text-smalllicense:cc-by-nc-4.0region:euconversational

🔗 Related Links

🤗 Visit HuggingFace ⚡ Quick Download