π Model Description
pipeline_tag: sentence-similarity tags:
- gguf
- embedding
- eurobert
- llama-cpp
- jina-embeddings-v5
- feature-extraction
- mteb
- vllm
- sentence-transformers
- multilingual

jina-embeddings-v5-text: Task-Targeted Embedding Distillation
Blog | Elastic Inference Service | ArXiv | Blog
Model Overview

jina-embeddings-v5-text-nano-retrieval is a compact, high-performance text embedding model designed for information retrieval.
It is part of the jina-embeddings-v5-text model family, which also includes jina-embeddings-v5-text-small, for better performance at a bigger size.
Trained using a novel approach that combines distillation with task-specific contrastive losses, jina-embeddings-v5-text-nano-retrieval outperforms existing state-of-the-art models of similar size across diverse embedding benchmarks.
| Feature | Value |
|---|---|
| Parameters | 239M |
| Supported Tasks | retrieval |
| Max Sequence Length | 8192 |
| Embedding Dimension | 768 |
| Matryoshka Dimensions | 32, 64, 128, 256, 512, 768 |
| Pooling Strategy | Last-token pooling |
| Base Model | jinaai/jina-embeddings-v5-text-nano |



Training and Evaluation
For training details and evaluation results, see our technical report.
Usage
Requirements
The following Python packages are required:
transformers>=5.1.0torch>=2.8.0peft>=0.15.2vllm==0.15.1
Optional / Recommended
- flash-attention: Installing flash-attention is recommended for improved inference speed and efficiency, but not mandatory.
- sentence-transformers: If you want to use the model via the
sentence-transformersinterface, install this package as well.
The fastest way to use v5-text in production. Elastic Inference Service (EIS) provides managed embedding inference with built-in scaling, so you can generate embeddings directly within your Elastic deployment.
PUT inference/textembedding/jina-v5
{
"service": "elastic",
"service_settings": {
"model_id": "jina-embeddings-v5-text-nano"
}
}
See the Elastic Inference Service documentation for setup details.
from sentence_transformers import SentenceTransformer
import torch
model = SentenceTransformer(
"jinaai/jina-embeddings-v5-text-nano-retrieval",
trustremotecode=True,
model_kwargs={"dtype": torch.bfloat16}, # Recommended for GPUs
configkwargs={"attnimplementation": "flashattention_2"}, # Recommended but optional
)
Optional: set truncate_dim in encode() to control embedding size
query = "Which planet is known as the Red Planet?"
documents = [
"Venus is often called Earth's twin because of its similar size and proximity.",
"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
"Jupiter, the largest planet in our solar system, has a prominent red spot.",
"Saturn, famous for its rings, is sometimes mistaken for the Red Planet.",
]
Encode query and documents
queryembeddings = model.encode(sentences=query, promptname="query")
documentembeddings = model.encode(sentences=documents, promptname="document")
print(queryembeddings.shape, documentembeddings.shape)
(768,) (4, 768)
similarity = model.similarity(queryembeddings, documentembeddings)
print(similarity)
tensor([[0.5013, 0.7914, 0.6133, 0.5736]])
via vLLM
from vllm import LLM
from vllm.config.pooler import PoolerConfig
Initialize model
name = "jinaai/jina-embeddings-v5-text-nano-retrieval"
model = LLM(
model=name,
dtype="float16",
runner="pooling",
trustremotecode=True,
poolerconfig=PoolerConfig(seqpooling_type="LAST", normalize=True),
)
Create text prompts
query = "Overview of climate change impacts on coastal cities"
query_prompt = f"Query: {query}"
document = "The impacts of climate change on coastal cities are significant.."
document_prompt = f"Document: {document}"
Encode all prompts
prompts = [queryprompt, documentprompt]
outputs = model.encode(prompts, pooling_task="embed")
via llama.cpp (GGUF)
Since our nano model is based on jinaai/jina-embeddings-v5-text-nano, which is not yet supported by llama.cpp, we provide our own branch of llama.cpp, which implements the necessary changes to support it for now.
To start the OpenAI API compatible HTTP server, run with the respective model version:
llama-server \
-hf jinaai/jina-embeddings-v5-text-nano-retrieval:F16 \
--embedding \
--pooling last \
--batch-size 8192 \
--ubatch-size 8192 \
--ctx-size 8192
Client:
curl -X POST "http://127.0.0.1:8080/v1/embeddings" \
-H "Content-Type: application/json" \
-d '{
"input": [
"Query: A beautiful sunset over the beach",
"Query: Un beau coucher de soleil sur la plage",
"Document: ζ΅·ζ»©δΈηΎδΈ½ηζ₯θ½",
"Document: ζ΅θΎΊγ«ζ²γηΎγγε€ζ₯",
"Document: Golden sunlight melts into the horizon, painting waves in warm amber and rose, while the sky whispers goodnight to the quiet, endless sea."
]
}'
Note: For the retrieval variant, add Query: or Document: prefix in front of your input as shown above.
License
The model is licensed under CC BY-NC 4.0. For commercial use, please contact us.
Citation
If you find jina-embeddings-v5-text-nano-retrieval useful in your research, please cite the following paper:
@misc{akram2026jinaembeddingsv5texttasktargetedembeddingdistillation,
title={jina-embeddings-v5-text: Task-Targeted Embedding Distillation},
author={Mohammad Kalim Akram and Saba Sturua and Nastia Havriushenko and Quentin Herreros and Michael GΓΌnther and Maximilian Werk and Han Xiao},
year={2026},
eprint={2602.15547},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.15547},
}
π GGUF File List
| π Filename | π¦ Size | β‘ Download |
|---|---|---|
|
v5-nano-retrieval-F16.gguf
LFS
FP16
|
411.41 MB | Download |
|
v5-nano-retrieval-IQ1_M.gguf
LFS
|
96.99 MB | Download |
|
v5-nano-retrieval-IQ1_S.gguf
LFS
|
94.83 MB | Download |
|
v5-nano-retrieval-IQ2_M.gguf
LFS
Q2
|
108.43 MB | Download |
|
v5-nano-retrieval-IQ2_XXS.gguf
LFS
Q2
|
100.6 MB | Download |
|
v5-nano-retrieval-IQ4_NL.gguf
LFS
Q4
|
145.34 MB | Download |
|
v5-nano-retrieval-IQ4_XS.gguf
LFS
Q4
|
141.97 MB | Download |
|
v5-nano-retrieval-Q2_K.gguf
LFS
Q2
|
124.15 MB | Download |
|
v5-nano-retrieval-Q3_K_M.gguf
LFS
Q3
|
136.52 MB | Download |
|
v5-nano-retrieval-Q4_K_M.gguf
Recommended
LFS
Q4
|
149.7 MB | Download |
|
v5-nano-retrieval-Q5_K_M.gguf
LFS
Q5
|
161.09 MB | Download |
|
v5-nano-retrieval-Q5_K_S.gguf
LFS
Q5
|
158.84 MB | Download |
|
v5-nano-retrieval-Q6_K.gguf
LFS
Q6
|
173.19 MB | Download |
|
v5-nano-retrieval-Q8_0.gguf
LFS
Q8
|
222.1 MB | Download |