πŸ“‹ Model Description


pipeline_tag: feature-extraction tags:
  • gguf
  • embedding
  • eurobert
  • llama-cpp
  • jina-embeddings-v5
  • feature-extraction
  • mteb
  • vllm
  • sentence-transformers
language:
  • multilingual
base_model: jinaai/jina-embeddings-v5-text-nano basemodelrelation: quantized inference: false license: cc-by-nc-4.0 library_name: llama.cpp



Jina AI: Your Search Foundation, Supercharged!

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

Blog | Elastic Inference Service | ArXiv | Blog

Model Overview


jina-embeddings-v5-text Architecture


jina-embeddings-v5-text-nano-clustering is a compact, high-performance text embedding model designed for clustering.

It is part of the jina-embeddings-v5-text model family, which also includes jina-embeddings-v5-text-small, for better performance at a bigger size.

Trained using a novel approach that combines distillation with task-specific contrastive losses, jina-embeddings-v5-text-nano-clustering outperforms existing state-of-the-art models of similar size across diverse embedding benchmarks.




FeatureValue
Parameters239M
Supported Tasksclustering
Max Sequence Length8192
Embedding Dimension768
Matryoshka Dimensions32, 64, 128, 256, 512, 768
Pooling StrategyLast-token pooling
Base Modeljinaai/jina-embeddings-v5-text-nano


MMTEB Multilingual Benchmark


MTEB English Benchmark


Retrieval Benchmark Results

Training and Evaluation

For training details and evaluation results, see our technical report.

Usage


Requirements

The following Python packages are required:

  • transformers>=5.1.0
  • torch>=2.8.0
  • peft>=0.15.2
  • vllm==0.15.1

Optional / Recommended

  • flash-attention: Installing flash-attention is recommended for improved inference speed and efficiency, but not mandatory.
  • sentence-transformers: If you want to use the model via the sentence-transformers interface, install this package as well.


via Elastic Inference Service

The fastest way to use v5-text in production. Elastic Inference Service (EIS) provides managed embedding inference with built-in scaling, so you can generate embeddings directly within your Elastic deployment.

PUT inference/textembedding/jina-v5
{
  "service": "elastic",
  "service_settings": {
    "model_id": "jina-embeddings-v5-text-nano"
  }
}

See the Elastic Inference Service documentation for setup details.


via sentence-transformers

from sentence_transformers import SentenceTransformer
import torch

model = SentenceTransformer(
"jinaai/jina-embeddings-v5-text-nano-clustering",
trustremotecode=True,
model_kwargs={"dtype": torch.bfloat16}, # Recommended for GPUs
configkwargs={"attnimplementation": "flashattention_2"}, # Recommended but optional
)

Optional: set truncate_dim in encode() to control embedding size

texts = [
"We propose a novel neural network architecture for image segmentation.",
"This paper analyzes the effects of monetary policy on inflation.",
"Our method achieves state-of-the-art results on object detection benchmarks.",
"We study the relationship between interest rates and housing prices.",
"A new attention mechanism is introduced for visual recognition tasks.",
]

Encode texts

embeddings = model.encode(texts) print(embeddings.shape)

(5, 768)

similarity = model.similarity(embeddings, embeddings)
print(similarity)

tensor([[1.0000, 0.2933, 0.9304, 0.2928, 0.8635],


[0.2933, 1.0000, 0.3062, 0.8083, 0.3035],


[0.9304, 0.3062, 1.0000, 0.2943, 0.8651],


[0.2928, 0.8083, 0.2943, 1.0000, 0.2827],


[0.8635, 0.3035, 0.8651, 0.2827, 1.0000]])


via vLLM

from vllm import LLM

Initialize model

name = "jinaai/jina-embeddings-v5-text-nano-clustering" model = LLM( model=name, dtype="float16", runner="pooling", trustremotecode=True, poolerconfig=PoolerConfig(seqpooling_type="LAST", normalize=True) )

Create text prompts

query = "Overview of climate change impacts on coastal cities" query_prompt = f"Query: {query}"

document = "The impacts of climate change on coastal cities are significant.."
document_prompt = f"Document: {document}"

Encode all prompts

prompts = [queryprompt, documentprompt] outputs = model.encode(prompts, pooling_task="embed")

embed_query = outputs[0].outputs.data
embed_document = outputs[1].outputs.data


via llama.cpp (GGUF)

Since our nano model is based on jinaai/jina-embeddings-v5-text-nano, which is not yet supported by llama.cpp, we provide our own branch of llama.cpp, which implements the necessary changes to support it for now.

To start the OpenAI API compatible HTTP server, run with the respective model version:

llama-server \
  -hf jinaai/jina-embeddings-v5-text-nano-clustering:F16 \
  --embedding \
  --pooling last \
  --batch-size 8192 \
  --ubatch-size 8192 \
  --ctx-size 8192

Client:

curl -X POST "http://127.0.0.1:8080/v1/embeddings" \
  -H "Content-Type: application/json" \
  -d '{
    "input": [
      "Document: A beautiful sunset over the beach",
      "Document: Un beau coucher de soleil sur la plage",
      "Document: ζ΅·ζ»©δΈŠηΎŽδΈ½ηš„ζ—₯落",
      "Document: ζ΅œθΎΊγ«ζ²ˆγ‚€ηΎŽγ—γ„ε€•ζ—₯",
      "Document: Golden sunlight melts into the horizon, painting waves in warm amber and rose, while the sky whispers goodnight to the quiet, endless sea."
    ]
  }'

Note: For the clustering variant, always add Document: prefix in front of your input as shown above.

License

The model is licensed under CC BY-NC 4.0. For commercial use, please contact us.

Citation

If you find jina-embeddings-v5-text-nano-clustering useful in your research, please cite the following paper:

@misc{akram2026jinaembeddingsv5texttasktargetedembeddingdistillation,
      title={jina-embeddings-v5-text: Task-Targeted Embedding Distillation}, 
      author={Mohammad Kalim Akram and Saba Sturua and Nastia Havriushenko and Quentin Herreros and Michael GΓΌnther and Maximilian Werk and Han Xiao},
      year={2026},
      eprint={2602.15547},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.15547}, 
}

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
v5-nano-clustering-F16.gguf
LFS FP16
411.41 MB Download
v5-nano-clustering-IQ1_M.gguf
LFS
96.99 MB Download
v5-nano-clustering-IQ1_S.gguf
LFS
94.83 MB Download
v5-nano-clustering-IQ2_M.gguf
LFS Q2
108.43 MB Download
v5-nano-clustering-IQ2_XXS.gguf
LFS Q2
100.6 MB Download
v5-nano-clustering-IQ4_NL.gguf
LFS Q4
145.34 MB Download
v5-nano-clustering-IQ4_XS.gguf
LFS Q4
141.97 MB Download
v5-nano-clustering-Q2_K.gguf
LFS Q2
124.15 MB Download
v5-nano-clustering-Q3_K_M.gguf
LFS Q3
136.52 MB Download
v5-nano-clustering-Q4_K_M.gguf
Recommended LFS Q4
149.7 MB Download
v5-nano-clustering-Q5_K_M.gguf
LFS Q5
161.09 MB Download
v5-nano-clustering-Q5_K_S.gguf
LFS Q5
158.84 MB Download
v5-nano-clustering-Q6_K.gguf
LFS Q6
173.19 MB Download
v5-nano-clustering-Q8_0.gguf
LFS Q8
222.1 MB Download