πŸ“‹ Model Description


license: apache-2.0 pipeline_tag: text-classification tags:
  • transformers
  • sentence-transformers
  • text-embeddings-inference
language:
  • multilingual

bge-reranker-v2-m3-GGUF

Model creator: BAAI

Original model: bge-reranker-v2-m3

GGUF quantization: based on llama.cpp release f4d2b


Reranker

More details please refer to our Github: FlagEmbedding.

Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding.
You can get a relevance score by inputting query and passage to the reranker.
And the score can be mapped to a float value in [0,1] by sigmoid function.

Model List

ModelBase modelLanguagelayerwisefeature
BAAI/bge-reranker-basexlm-roberta-baseChinese and English-Lightweight reranker model, easy to deploy, with fast inference.
BAAI/bge-reranker-largexlm-roberta-largeChinese and English-Lightweight reranker model, easy to deploy, with fast inference.
BAAI/bge-reranker-v2-m3bge-m3Multilingual-Lightweight reranker model, possesses strong multilingual capabilities, easy to deploy, with fast inference.
BAAI/bge-reranker-v2-gemmagemma-2bMultilingual-Suitable for multilingual contexts, performs well in both English proficiency and multilingual capabilities.
BAAI/bge-reranker-v2-minicpm-layerwiseMiniCPM-2B-dpo-bf16Multilingual8-40Suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers for output, facilitating accelerated inference.

You can select the model according your senario and resource.

Usage

Using FlagEmbedding

pip install -U FlagEmbedding

#### For normal reranker (bge-reranker-base / bge-reranker-large / bge-reranker-v2-m3 )

Get relevance scores (higher scores indicate more relevance):

from FlagEmbedding import FlagReranker
reranker = FlagReranker('BAAI/bge-reranker-v2-m3', usefp16=True) # Setting usefp16 to True speeds up computation with a slight performance degradation

score = reranker.compute_score(['query', 'passage'])
print(score) # -5.65234375

You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score

score = reranker.compute_score(['query', 'passage'], normalize=True) print(score) # 0.003497010252573502

scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
print(scores) # [-8.1875, 5.26171875]

You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score

scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], normalize=True) print(scores) # [0.00027803096387751553, 0.9948403768236574]

#### For LLM-based reranker

from FlagEmbedding import FlagLLMReranker
reranker = FlagLLMReranker('BAAI/bge-reranker-v2-gemma', usefp16=True) # Setting usefp16 to True speeds up computation with a slight performance degradation

reranker = FlagLLMReranker('BAAI/bge-reranker-v2-gemma', usebf16=True) # You can also set usebf16=True to speed up computation with a slight performance degradation

score = reranker.compute_score(['query', 'passage'])
print(score)

scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
print(scores)

#### For LLM-based layerwise reranker

from FlagEmbedding import LayerWiseFlagLLMReranker
reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', usefp16=True) # Setting usefp16 to True speeds up computation with a slight performance degradation

reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', usebf16=True) # You can also set usebf16=True to speed up computation with a slight performance degradation

score = reranker.computescore(['query', 'passage'], cutofflayers=[28]) # Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
print(score)

scores = reranker.computescore([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], cutofflayers=[28])
print(scores)

Using Huggingface transformers

#### For normal reranker (bge-reranker-base / bge-reranker-large / bge-reranker-v2-m3 )

Get relevance scores (higher scores indicate more relevance):

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-m3')
model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-v2-m3')
model.eval()

pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
with torch.no_grad():
inputs = tokenizer(pairs, padding=True, truncation=True, returntensors='pt', maxlength=512)
scores = model(inputs, return_dict=True).logits.view(-1, ).float()
print(scores)

#### For LLM-based reranker

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

def getinputs(pairs, tokenizer, prompt=None, maxlength=1024):
if prompt is None:
prompt = "Given a query A and a passage B, determine whether the passage contains an answer to the query by providing a prediction of either 'Yes' or 'No'."
sep = "\n"
prompt_inputs = tokenizer(prompt,
return_tensors=None,
addspecialtokens=False)['input_ids']
sep_inputs = tokenizer(sep,
return_tensors=None,
addspecialtokens=False)['input_ids']
inputs = []
for query, passage in pairs:
query_inputs = tokenizer(f'A: {query}',
return_tensors=None,
addspecialtokens=False,
maxlength=maxlength * 3 // 4,
truncation=True)
passage_inputs = tokenizer(f'B: {passage}',
return_tensors=None,
addspecialtokens=False,
maxlength=maxlength,
truncation=True)
item = tokenizer.prepareformodel(
[tokenizer.bostokenid] + queryinputs['inputids'],
sepinputs + passageinputs['input_ids'],
truncation='only_second',
maxlength=maxlength,
padding=False,
returnattentionmask=False,
returntokentype_ids=False,
addspecialtokens=False
)
item['inputids'] = item['inputids'] + sepinputs + promptinputs
item['attentionmask'] = [1] * len(item['inputids'])
inputs.append(item)
return tokenizer.pad(
inputs,
padding=True,
maxlength=maxlength + len(sepinputs) + len(promptinputs),
padtomultiple_of=8,
return_tensors='pt',
)

tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-gemma')
model = AutoModelForCausalLM.from_pretrained('BAAI/bge-reranker-v2-gemma')
yesloc = tokenizer('Yes', addspecialtokens=False)['inputids'][0]
model.eval()

pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
with torch.no_grad():
inputs = get_inputs(pairs, tokenizer)
scores = model(inputs, returndict=True).logits[:, -1, yesloc].view(-1, ).float()
print(scores)

#### For LLM-based layerwise reranker

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

def getinputs(pairs, tokenizer, prompt=None, maxlength=1024):
if prompt is None:
prompt = "Given a query A and a passage B, determine whether the passage contains an answer to the query by providing a prediction of either 'Yes' or 'No'."
sep = "\n"
prompt_inputs = tokenizer(prompt,
return_tensors=None,
addspecialtokens=False)['input_ids']
sep_inputs = tokenizer(sep,
return_tensors=None,
addspecialtokens=False)['input_ids']
inputs = []
for query, passage in pairs:
query_inputs = tokenizer(f'A: {query}',
return_tensors=None,
addspecialtokens=False,
maxlength=maxlength * 3 // 4,
truncation=True)
passage_inputs = tokenizer(f'B: {passage}',
return_tensors=None,
addspecialtokens=False,
maxlength=maxlength,
truncation=True)
item = tokenizer.prepareformodel(
[tokenizer.bostokenid] + queryinputs['inputids'],
sepinputs + passageinputs['input_ids'],
truncation='only_second',
maxlength=maxlength,
padding=False,
returnattentionmask=False,
returntokentype_ids=False,
addspecialtokens=False
)
item['inputids'] = item['inputids'] + sepinputs + promptinputs
item['attentionmask'] = [1] * len(item['inputids'])
inputs.append(item)
return tokenizer.pad(
inputs,
padding=True,
maxlength=maxlength + len(sepinputs) + len(promptinputs),
padtomultiple_of=8,
return_tensors='pt',
)

tokenizer = AutoTokenizer.frompretrained('BAAI/bge-reranker-v2-minicpm-layerwise', trustremote_code=True)
model = AutoModelForCausalLM.frompretrained('BAAI/bge-reranker-v2-minicpm-layerwise', trustremotecode=True, torchdtype=torch.bfloat16)
model = model.to('cuda')
model.eval()

pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
with torch.no_grad():
inputs = get_inputs(pairs, tokenizer).to(model.device)
allscores = model(inputs, returndict=True, cutoff_layers=[28])
allscores = [scores[:, -1].view(-1, ).float() for scores in allscores[0]]
print(all_scores)

Fine-tune

Data Format

Train data should be a json file, where each line is a dict like this:

{"query": str, "pos": List[str], "neg":List[str], "prompt": str}

query is the query, and pos is a list of positive texts, neg is a list of negative texts, prompt indicates the relationship between query and texts. If you have no negative texts for a query, you can random sample some from the entire corpus as the negatives.

See toyfinetunedata.jsonl for a toy data file.

Train

You can fine-tune the reranker with the following code:

For llm-based reranker

torchrun --nprocpernode {number of gpus} \
-m FlagEmbedding.llmreranker.finetunefor_instruction.run \
--output_dir {path to save model} \
--modelnameor_path google/gemma-2b \
--traindata ./toyfinetune_data.jsonl \
--learning_rate 2e-4 \
--numtrainepochs 1 \
--perdevicetrainbatchsize 1 \
--gradientaccumulationsteps 16 \
--dataloaderdroplast True \
--querymaxlen 512 \
--passagemaxlen 512 \
--traingroupsize 16 \
--logging_steps 1 \
--save_steps 2000 \
--savetotallimit 50 \
--ddpfindunused_parameters False \
--gradient_checkpointing \
--deepspeed stage1.json \
--warmup_ratio 0.1 \
--bf16 \
--use_lora True \
--lora_rank 32 \
--lora_alpha 64 \
--useflashattn True \
--targetmodules qproj kproj vproj o_proj

For llm-based layerwise reranker

torchrun --nprocpernode {number of gpus} \
-m FlagEmbedding.llmreranker.finetunefor_layerwise.run \
--output_dir {path to save model} \
--modelnameor_path openbmb/MiniCPM-2B-dpo-bf16 \
--traindata ./toyfinetune_data.jsonl \
--learning_rate 2e-4 \
--numtrainepochs 1 \
--perdevicetrainbatchsize 1 \
--gradientaccumulationsteps 16 \
--dataloaderdroplast True \
--querymaxlen 512 \
--passagemaxlen 512 \
--traingroupsize 16 \
--logging_steps 1 \
--save_steps 2000 \
--savetotallimit 50 \
--ddpfindunused_parameters False \
--gradient_checkpointing \
--deepspeed stage1.json \
--warmup_ratio 0.1 \
--bf16 \
--use_lora True \
--lora_rank 32 \
--lora_alpha 64 \
--useflashattn True \
--targetmodules qproj kproj vproj o_proj \
--start_layer 8 \
--head_multi True \
--head_type simple \
--loraextraparameters linear_head

Our rerankers are initialized from google/gemma-2b (for llm-based reranker) and openbmb/MiniCPM-2B-dpo-bf16 (for llm-based layerwise reranker), and we train it on a mixture of multilingual datasets:

Evaluation

  • llama-index.

!image-20240317193909373

  • BEIR.

rereank the top 100 results from bge-en-v1.5 large.

!image-20240317174633333

rereank the top 100 results from e5 mistral 7b instruct.

!image-20240317172949713

  • CMTEB-retrieval.
It rereank the top 100 results from bge-zh-v1.5 large.

!image-20240317173026235

  • miracl (multi-language).
It rereank the top 100 results from bge-m3.

!image-20240317173117639

Citation

If you find this repository useful, please consider giving a star and citation

@misc{li2023making,
      title={Making Large Language Models A Better Foundation For Dense Retrieval}, 
      author={Chaofan Li and Zheng Liu and Shitao Xiao and Yingxia Shao},
      year={2023},
      eprint={2312.15503},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{chen2024bge,
      title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation}, 
      author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
      year={2024},
      eprint={2402.03216},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
bge-reranker-v2-m3-FP16.gguf
LFS FP16
1.08 GB Download
bge-reranker-v2-m3-Q2_K.gguf
LFS Q2
349.49 MB Download
bge-reranker-v2-m3-Q3_K_M.gguf
LFS Q3
384.09 MB Download
bge-reranker-v2-m3-Q4_0.gguf
Recommended LFS Q4
402.6 MB Download
bge-reranker-v2-m3-Q4_K_M.gguf
LFS Q4
418.07 MB Download
bge-reranker-v2-m3-Q5_0.gguf
LFS Q5
438.73 MB Download
bge-reranker-v2-m3-Q5_K_M.gguf
LFS Q5
446.69 MB Download
bge-reranker-v2-m3-Q6_K.gguf
LFS Q6
477.11 MB Download
bge-reranker-v2-m3-Q8_0.gguf
LFS Q8
606.23 MB Download