πŸ“‹ Model Description


pipeline_tag: sentence-similarity tags:
  • gguf
  • embedding
  • qwen3
  • llama-cpp
  • jina-embeddings-v5
language:
  • multilingual
base_model: jinaai/jina-embeddings-v5-text-small basemodelrelation: quantized inference: false license: cc-by-nc-4.0 library_name: llama.cpp

jina-embeddings-v5-text-small-retrieval-GGUF

GGUF quantizations of jina-embeddings-v5-text-small-retrieval using llama.cpp. A 677M parameter multilingual embedding model quantized for efficient inference.

Elastic Inference Service | ArXiv | Blog

[!IMPORTANT]

We highly recommend to first read this blog post for more technical details and customized llama.cpp build.

Overview


jina-embeddings-v5-text Architecture

jina-embeddings-v5-text-small-retrieval is a task-specific embedding model for retrieval, part of the jina-embeddings-v5-text model family.




FeatureValue
Parameters677M
Taskretrieval
Embedding Dimension1024
Matryoshka Dimensions32, 64, 128, 256, 512, 768, 1024
Pooling StrategyLast-token pooling
Base Modeljina-embeddings-v5-text-small


MMTEB Multilingual Benchmark


MTEB English Benchmark


Retrieval Benchmark Results

Usage with llama.cpp


via Elastic Inference Service

The fastest way to use v5-text in production. Elastic Inference Service (EIS) provides managed embedding inference with built-in scaling, so you can generate embeddings directly within your Elastic deployment.

PUT inference/textembedding/jina-v5
{
  "service": "elastic",
  "service_settings": {
    "model_id": "jina-embeddings-v5-text-small"
  }
}

See the Elastic Inference Service documentation for setup details.

# Build llama.cpp (upstream)
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && cmake -B build && cmake --build build --config Release

Run embedding

./build/bin/llama-embedding -m jina-embeddings-v5-text-small-retrieval-Q8_0.gguf \ --pooling last -p "Your text here"

License

CC-BY-NC-4.0. For commercial use, please contact us.

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
v5-small-retrieval-F16.gguf
LFS FP16
1.12 GB Download
v5-small-retrieval-IQ1_M.gguf
LFS
206.04 MB Download
v5-small-retrieval-IQ1_S.gguf
LFS
198.38 MB Download
v5-small-retrieval-IQ2_M.gguf
LFS Q2
252.64 MB Download
v5-small-retrieval-IQ2_XXS.gguf
LFS Q2
218.82 MB Download
v5-small-retrieval-IQ4_NL.gguf
LFS Q4
363.89 MB Download
v5-small-retrieval-IQ4_XS.gguf
LFS Q4
350.76 MB Download
v5-small-retrieval-Q2_K.gguf
LFS Q2
282.51 MB Download
v5-small-retrieval-Q3_K_M.gguf
LFS Q3
331.05 MB Download
v5-small-retrieval-Q4_K_M.gguf
Recommended LFS Q4
378.33 MB Download
v5-small-retrieval-Q5_K_M.gguf
LFS Q5
423.83 MB Download
v5-small-retrieval-Q5_K_S.gguf
LFS Q5
416.39 MB Download
v5-small-retrieval-Q6_K.gguf
LFS Q6
472.17 MB Download
v5-small-retrieval-Q8_0.gguf
LFS Q8
609.82 MB Download