๐ Model Description
Quantization made by Richard Erkhov.
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA - GGUF
- Model creator: https://huggingface.co/swap-uniba/
- Original model: https://huggingface.co/swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA/
Original model description:
language:
- en
- it
license: llama3
library_name: transformers
tags:
- meta
- pythorch
- llama
- llama-3
- llamantino
base_model: meta-llama/Meta-Llama-3-8B-Instruct
datasets:
- gsarti/cleanmc4it
- Chat-Error/wizardalpacadolly_orca
- mlabonne/orpo-dpo-mix-40k
metrics:
- accuracy
model_creator: Marco Polignano - SWAP Research Group
pipeline_tag: text-generation
model-index:
- name: LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
numfewshot: 25
metrics:
- type: acc_norm
value: 74.57
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/openllmleaderboard?query=swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
numfewshot: 10
metrics:
- type: acc_norm
value: 92.75
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/openllmleaderboard?query=swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
numfewshot: 5
metrics:
- type: acc
value: 66.85
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/openllmleaderboard?query=swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
numfewshot: 0
metrics:
- type: mc2
value: 75.93
source:
url: https://huggingface.co/spaces/HuggingFaceH4/openllmleaderboard?query=swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
numfewshot: 5
metrics:
- type: acc
value: 82.0
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/openllmleaderboard?query=swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
numfewshot: 5
metrics:
- type: acc
value: 58.61
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/openllmleaderboard?query=swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
name: Open LLM Leaderboard
"Built with Meta Llama 3".
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA is a model of the LLaMAntino - Large Language Models family. The model is an instruction-tuned version of Meta-Llama-3-8b-instruct (a fine-tuned LLaMA 3 model). This model version aims to be the a Multilingual Model ๐ (EN ๐บ๐ธ + ITA๐ฎ๐น) to further fine-tuning on Specific Tasks in Italian.
The ๐ANITA project๐ (Advanced Natural-based interaction for the ITAlian language)
wants to provide Italian NLP researchers with an improved model for the Italian Language ๐ฎ๐น use cases.
Live DEMO: https://chat.llamantino.it/
It works only with Italian connection.
Model Details
Last Update: 10/05/2024
https://github.com/marcopoli/LLaMAntino-3-ANITA
| Model | HF | GGUF | EXL2 |
|---|---|---|---|
| swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA | Link | Link | Link |
Specifications
- Model developers:
Ph.D. Marco Polignano - University of Bari Aldo Moro, Italy
SWAP Research Group - Variations: The model release has been supervised fine-tuning (SFT) using QLoRA 4bit, on instruction-based datasets. DPO approach over the mlabonne/orpo-dpo-mix-40k dataset is used to align with human preferences for helpfulness and safety.
- Input: Models input text only.
- Language: Multilingual ๐ + Italian ๐ฎ๐น
- Output: Models generate text and code only.
- Model Architecture: Llama 3 architecture.
- Context length: 8K, 8192.
- Library Used: Unsloth
Playground
To use the model directly, there are many ways to get started, choose one of the following ways to experience it.
Prompt Template
<|startheaderid|>system<|endheaderid|>
{ SYS Prompt }<|eotid|><|startheaderid|>user<|endheader_id|>
{ USER Prompt }<|eotid|><|startheaderid|>assistant<|endheader_id|>
{ ASSIST Prompt }<|eot_id|>
Transformers
For direct use with
transformers, you can easily get started with the following steps.
- Firstly, you need to install transformers via the command below with
pip`.
pip install -U transformers trl peft accelerate bitsandbytes
- Right now, you can start using the model directly.
import torch
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
)
base_model = "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA"
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.frompretrained(basemodel)
sys = "Sei un an assistente AI per la lingua Italiana di nome LLaMAntino-3 ANITA " \
"(Advanced Natural-based interaction for the ITAlian language)." \
" Rispondi nella lingua usata per la domanda in modo chiaro, semplice ed esaustivo."
messages = [
{"role": "system", "content": sys},
{"role": "user", "content": "Chi รจ Carlo Magno?"}
]
#Method 1
prompt = tokenizer.applychattemplate(messages, tokenize=False, addgenerationprompt=True)
inputs = tokenizer(prompt, returntensors="pt", addspecial_tokens=False)
for k,v in inputs.items():
inputs[k] = v.cuda()
outputs = model.generate(inputs, maxnewtokens=512, dosample=True, topp=0.9, temperature=0.6)
results = tokenizer.batch_decode(outputs)[0]
print(results)
#Method 2
import transformers
pipe = transformers.pipeline(
model=model,
tokenizer=tokenizer,
returnfulltext=False, # langchain expects the full text
task='text-generation',
maxnewtokens=512, # max number of tokens to generate in the output
temperature=0.6, #temperature for more or less creative answers
do_sample=True,
top_p=0.9,
)
sequences = pipe(messages)
for seq in sequences:
print(f"{seq['generated_text']}")
- Additionally, you can also use a model with 4bit quantization to reduce the required resources at least. You can start with the code below.
import torch
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
)
base_model = "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA"
bnb_config = BitsAndBytesConfig(
loadin4bit=True,
bnb4bitquant_type="nf4",
bnb4bitcompute_dtype=torch.bfloat16,
bnb4bitusedoublequant=False,
)
model = AutoModelForCausalLM.from_pretrained(
base_model,
quantizationconfig=bnbconfig,
device_map="auto",
)
tokenizer = AutoTokenizer.frompretrained(basemodel)
sys = "Sei un an assistente AI per la lingua Italiana di nome LLaMAntino-3 ANITA " \
"(Advanced Natural-based interaction for the ITAlian language)." \
" Rispondi nella lingua usata per la domanda in modo chiaro, semplice ed esaustivo."
messages = [
{"role": "system", "content": sys},
{"role": "user", "content": "Chi รจ Carlo Magno?"}
]
#Method 1
prompt = tokenizer.applychattemplate(messages, tokenize=False, addgenerationprompt=True)
inputs = tokenizer(prompt, returntensors="pt", addspecial_tokens=False)
for k,v in inputs.items():
inputs[k] = v.cuda()
outputs = model.generate(inputs, maxnewtokens=512, dosample=True, topp=0.9, temperature=0.6)
results = tokenizer.batch_decode(outputs)[0]
print(results)
#Method 2
import transformers
pipe = transformers.pipeline(
model=model,
tokenizer=tokenizer,
returnfulltext=False, # langchain expects the full text
task='text-generation',
maxnewtokens=512, # max number of tokens to generate in the output
temperature=0.6, #temperature for more or less creative answers
do_sample=True,
top_p=0.9,
)
sequences = pipe(messages)
for seq in sequences:
print(f"{seq['generated_text']}")
Evaluation
Open LLM Leaderboard:
Evaluated with lm-evaluation-benchmark-harness for the Open Italian LLMs Leaderboard
lmeval --model hf --modelargs pretrained=HUGGINGFACEMODELID --tasks hellaswagit,arcit --device cuda:0 --batch_size auto:2
lmeval --model hf --modelargs pretrained=HUGGINGFACEMODELID --tasks mmmluit --numfewshot 5 --device cuda:0 --batchsize auto:2
| Metric | Value |
|---|---|
| Avg. | 0.6160 |
| Arc_IT | 0.5714 |
| Hellaswag_IT | 0.7093 |
| MMLU_IT | 0.5672 |
Unsloth

Unsloth, a great tool that helps us easily develop products, at a lower cost than expected.
Citation instructions
@misc{polignano2024advanced,
title={Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA},
author={Marco Polignano and Pierpaolo Basile and Giovanni Semeraro},
year={2024},
eprint={2405.07101},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{basile2023llamantino,
title={LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language},
author={Pierpaolo Basile and Elio Musacchio and Marco Polignano and Lucia Siciliani and Giuseppe Fiameni and Giovanni Semeraro},
year={2023},
eprint={2312.09993},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@article{llama3modelcard,
title={Llama 3 Model Card},
author={AI@Meta},
year={2024},
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}
Acknowledgments
We acknowledge the support of the PNRR project FAIR - Future AI Research (PE00000013), Spoke 6 - Symbiotic AI (CUP H97G22000210007) under the NRRP MUR program funded by the NextGenerationEU. Models are built on the Leonardo supercomputer with the support of CINECA-Italian Super Computing Resource Allocation, class C project IscrC\Pro\MRS (HP10CQO70G).
Open LLM Leaderboard Evaluation Results
Detailed results can be found here| Metric | Value |
|---|---|
| Avg. | 75.12 |
| AI2 Reasoning Challenge (25-Shot) | 74.57 |
| HellaSwag (10-Shot) | 92.75 |
| MMLU (5-Shot) | 66.85 |
| TruthfulQA (0-shot) | 75.93 |
| Winogrande (5-shot) | 82.00 |
| GSM8k (5-shot) | 58.61 |
๐ GGUF File List
| ๐ Filename | ๐ฆ Size | โก Download |
|---|---|---|
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.IQ3_M.gguf
LFS
Q3
|
3.52 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.IQ3_S.gguf
LFS
Q3
|
3.43 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.IQ3_XS.gguf
LFS
Q3
|
3.28 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.IQ4_NL.gguf
LFS
Q4
|
4.38 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.IQ4_XS.gguf
LFS
Q4
|
4.18 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q2_K.gguf
LFS
Q2
|
2.96 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q3_K.gguf
LFS
Q3
|
3.74 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q3_K_L.gguf
LFS
Q3
|
4.03 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q3_K_M.gguf
LFS
Q3
|
3.74 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q3_K_S.gguf
LFS
Q3
|
3.41 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q4_0.gguf
Recommended
LFS
Q4
|
4.34 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q4_1.gguf
LFS
Q4
|
4.78 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q4_K.gguf
LFS
Q4
|
4.58 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q4_K_M.gguf
LFS
Q4
|
4.58 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q4_K_S.gguf
LFS
Q4
|
4.37 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q5_0.gguf
LFS
Q5
|
5.21 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q5_1.gguf
LFS
Q5
|
5.65 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q5_K.gguf
LFS
Q5
|
5.34 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q5_K_M.gguf
LFS
Q5
|
5.34 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q5_K_S.gguf
LFS
Q5
|
5.21 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q6_K.gguf
LFS
Q6
|
6.14 GB | Download |
|
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q8_0.gguf
LFS
Q8
|
7.95 GB | Download |