RichardErkhov/swap-uniba_-_LLaMAntino-3-ANITA-8B-Inst-DPO-ITA-gguf

Name: RichardErkhov/swap-uniba_-_LLaMAntino-3-ANITA-8B-Inst-DPO-ITA-gguf
Author: RichardErkhov

High-quality GGUF model

2.3K 📥 Downloads

0 ❤️ Likes

22 📁 GGUF Files

100.19 GB 💾 Total Size

2 years ago 🔄 Last Updated

📋 Model Description

Quantization made by Richard Erkhov.

Github

Discord

Request more models

LLaMAntino-3-ANITA-8B-Inst-DPO-ITA - GGUF

Model creator: https://huggingface.co/swap-uniba/
Original model: https://huggingface.co/swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA/

Name	Quant method	Size
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q2K.gguf	Q2K	2.96GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.IQ3XS.gguf	IQ3XS	3.28GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.IQ3S.gguf	IQ3S	3.43GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q3KS.gguf	Q3K_S	3.41GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.IQ3M.gguf	IQ3M	3.52GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q3K.gguf	Q3K	3.74GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q3KM.gguf	Q3K_M	3.74GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q3KL.gguf	Q3K_L	4.03GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.IQ4XS.gguf	IQ4XS	4.18GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q40.gguf	Q40	4.34GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.IQ4NL.gguf	IQ4NL	4.38GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q4KS.gguf	Q4K_S	4.37GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q4K.gguf	Q4K	4.58GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q4KM.gguf	Q4K_M	4.58GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q41.gguf	Q41	4.78GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q50.gguf	Q50	5.21GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q5KS.gguf	Q5K_S	5.21GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q5K.gguf	Q5K	5.34GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q5KM.gguf	Q5K_M	5.34GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q51.gguf	Q51	5.65GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q6K.gguf	Q6K	6.14GB
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q80.gguf	Q80	7.95GB

Original model description:

language:

license: llama3
library_name: transformers
tags:

facebook
meta
pythorch
llama
llama-3
llamantino

base_model: meta-llama/Meta-Llama-3-8B-Instruct
datasets:

gsarti/cleanmc4it
Chat-Error/wizardalpacadolly_orca
mlabonne/orpo-dpo-mix-40k

metrics:

accuracy

model_creator: Marco Polignano - SWAP Research Group
pipeline_tag: text-generation
model-index:

name: LLaMAntino-3-ANITA-8B-Inst-DPO-ITA

results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
numfewshot: 25
metrics:
- type: acc_norm
value: 74.57
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/openllmleaderboard?query=swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
numfewshot: 10
metrics:
- type: acc_norm
value: 92.75
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/openllmleaderboard?query=swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
numfewshot: 5
metrics:
- type: acc
value: 66.85
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/openllmleaderboard?query=swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
numfewshot: 0
metrics:
- type: mc2
value: 75.93
source:
url: https://huggingface.co/spaces/HuggingFaceH4/openllmleaderboard?query=swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
numfewshot: 5
metrics:
- type: acc
value: 82.0
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/openllmleaderboard?query=swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
numfewshot: 5
metrics:
- type: acc
value: 58.61
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/openllmleaderboard?query=swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA
name: Open LLM Leaderboard

"Built with Meta Llama 3".

LLaMAntino-3-ANITA-8B-Inst-DPO-ITA is a model of the LLaMAntino - Large Language Models family. The model is an instruction-tuned version of Meta-Llama-3-8b-instruct (a fine-tuned LLaMA 3 model). This model version aims to be the a Multilingual Model 🏁 (EN 🇺🇸 + ITA🇮🇹) to further fine-tuning on Specific Tasks in Italian.

The 🌟ANITA project🌟 (Advanced Natural-based interaction for the ITAlian language)
wants to provide Italian NLP researchers with an improved model for the Italian Language 🇮🇹 use cases.

Live DEMO: https://chat.llamantino.it/

It works only with Italian connection.

Model Details

Last Update: 10/05/2024

https://github.com/marcopoli/LLaMAntino-3-ANITA

Model	HF	GGUF	EXL2
swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA	Link	Link	Link

Specifications

Model developers:
Ph.D. Marco Polignano - University of Bari Aldo Moro, Italy
SWAP Research Group
Variations: The model release has been supervised fine-tuning (SFT) using QLoRA 4bit, on instruction-based datasets. DPO approach over the mlabonne/orpo-dpo-mix-40k dataset is used to align with human preferences for helpfulness and safety.
Input: Models input text only.
Language: Multilingual 🏁 + Italian 🇮🇹
Output: Models generate text and code only.
Model Architecture: Llama 3 architecture.
Context length: 8K, 8192.
Library Used: Unsloth

Playground

To use the model directly, there are many ways to get started, choose one of the following ways to experience it.

Prompt Template

<|startheaderid|>system<|endheaderid|>

{ SYS Prompt }<|eotid|><|startheaderid|>user<|endheader_id|>

{ USER Prompt }<|eotid|><|startheaderid|>assistant<|endheader_id|>

{ ASSIST Prompt }<|eot_id|>



Transformers

For direct use with transformers, you can easily get started with the following steps.

Firstly, you need to install transformers via the command below with pip`.

pip install -U transformers trl peft accelerate bitsandbytes

Right now, you can start using the model directly.

import torch
  from transformers import (
      AutoModelForCausalLM,
      AutoTokenizer,
  )

base_model = "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA"
  model = AutoModelForCausalLM.from_pretrained(
      base_model,
      torch_dtype=torch.bfloat16,
      device_map="auto",
  )
  tokenizer = AutoTokenizer.frompretrained(basemodel)

sys = "Sei un an assistente AI per la lingua Italiana di nome LLaMAntino-3 ANITA " \
      "(Advanced Natural-based interaction for the ITAlian language)." \
      " Rispondi nella lingua usata per la domanda in modo chiaro, semplice ed esaustivo."

messages = [
      {"role": "system", "content": sys},
      {"role": "user", "content": "Chi è Carlo Magno?"}
  ]

#Method 1
  prompt = tokenizer.applychattemplate(messages, tokenize=False, addgenerationprompt=True)
  inputs = tokenizer(prompt, returntensors="pt", addspecial_tokens=False)
  for k,v in inputs.items():
      inputs[k] = v.cuda()
  outputs = model.generate(inputs, maxnewtokens=512, dosample=True, topp=0.9, temperature=0.6)
  results = tokenizer.batch_decode(outputs)[0]
  print(results)

#Method 2
  import transformers
  pipe = transformers.pipeline(
      model=model,
      tokenizer=tokenizer,
      returnfulltext=False, # langchain expects the full text
      task='text-generation',
      maxnewtokens=512, # max number of tokens to generate in the output
      temperature=0.6,  #temperature for more or less creative answers
      do_sample=True,
      top_p=0.9,
  )

sequences = pipe(messages)
  for seq in sequences:
      print(f"{seq['generated_text']}")

Additionally, you can also use a model with 4bit quantization to reduce the required resources at least. You can start with the code below.

import torch
  from transformers import (
      AutoModelForCausalLM,
      AutoTokenizer,
      BitsAndBytesConfig,
  )

base_model = "swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA"
  bnb_config = BitsAndBytesConfig(
      loadin4bit=True,
      bnb4bitquant_type="nf4",
      bnb4bitcompute_dtype=torch.bfloat16,
      bnb4bitusedoublequant=False,
  )
  model = AutoModelForCausalLM.from_pretrained(
      base_model,
      quantizationconfig=bnbconfig,
      device_map="auto",
  )
  tokenizer = AutoTokenizer.frompretrained(basemodel)

sys = "Sei un an assistente AI per la lingua Italiana di nome LLaMAntino-3 ANITA " \
      "(Advanced Natural-based interaction for the ITAlian language)." \
      " Rispondi nella lingua usata per la domanda in modo chiaro, semplice ed esaustivo."

messages = [
      {"role": "system", "content": sys},
      {"role": "user", "content": "Chi è Carlo Magno?"}
  ]

#Method 1
  prompt = tokenizer.applychattemplate(messages, tokenize=False, addgenerationprompt=True)
  inputs = tokenizer(prompt, returntensors="pt", addspecial_tokens=False)
  for k,v in inputs.items():
      inputs[k] = v.cuda()
  outputs = model.generate(inputs, maxnewtokens=512, dosample=True, topp=0.9, temperature=0.6)
  results = tokenizer.batch_decode(outputs)[0]
  print(results)

#Method 2
  import transformers
  pipe = transformers.pipeline(
      model=model,
      tokenizer=tokenizer,
      returnfulltext=False, # langchain expects the full text
      task='text-generation',
      maxnewtokens=512, # max number of tokens to generate in the output
      temperature=0.6,  #temperature for more or less creative answers
      do_sample=True,
      top_p=0.9,
  )

sequences = pipe(messages)
  for seq in sequences:
      print(f"{seq['generated_text']}")

Evaluation

Open LLM Leaderboard:

Evaluated with lm-evaluation-benchmark-harness for the Open Italian LLMs Leaderboard

lmeval --model hf --modelargs pretrained=HUGGINGFACEMODELID  --tasks hellaswagit,arcit  --device cuda:0 --batch_size auto:2
   lmeval --model hf --modelargs pretrained=HUGGINGFACEMODELID  --tasks mmmluit --numfewshot 5  --device cuda:0 --batchsize auto:2

Metric	Value
Avg.	0.6160
Arc_IT	0.5714
Hellaswag_IT	0.7093
MMLU_IT	0.5672

Unsloth

Unsloth, a great tool that helps us easily develop products, at a lower cost than expected.

Citation instructions

@misc{polignano2024advanced,
      title={Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA}, 
      author={Marco Polignano and Pierpaolo Basile and Giovanni Semeraro},
      year={2024},
      eprint={2405.07101},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{basile2023llamantino,
      title={LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language}, 
      author={Pierpaolo Basile and Elio Musacchio and Marco Polignano and Lucia Siciliani and Giuseppe Fiameni and Giovanni Semeraro},
      year={2023},
      eprint={2312.09993},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@article{llama3modelcard,
  title={Llama 3 Model Card},
  author={AI@Meta},
  year={2024},
  url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}

Acknowledgments

We acknowledge the support of the PNRR project FAIR - Future AI Research (PE00000013), Spoke 6 - Symbiotic AI (CUP H97G22000210007) under the NRRP MUR program funded by the NextGenerationEU. Models are built on the Leonardo supercomputer with the support of CINECA-Italian Super Computing Resource Allocation, class C project IscrC\Pro\MRS (HP10CQO70G).

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	75.12
AI2 Reasoning Challenge (25-Shot)	74.57
HellaSwag (10-Shot)	92.75
MMLU (5-Shot)	66.85
TruthfulQA (0-shot)	75.93
Winogrande (5-shot)	82.00
GSM8k (5-shot)	58.61

📂 GGUF File List

📁 Filename	📦 Size	⚡ Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.IQ3_M.gguf LFS Q3	3.52 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.IQ3_S.gguf LFS Q3	3.43 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.IQ3_XS.gguf LFS Q3	3.28 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.IQ4_NL.gguf LFS Q4	4.38 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.IQ4_XS.gguf LFS Q4	4.18 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q2_K.gguf LFS Q2	2.96 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q3_K.gguf LFS Q3	3.74 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q3_K_L.gguf LFS Q3	4.03 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q3_K_M.gguf LFS Q3	3.74 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q3_K_S.gguf LFS Q3	3.41 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q4_0.gguf Recommended LFS Q4	4.34 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q4_1.gguf LFS Q4	4.78 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q4_K.gguf LFS Q4	4.58 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q4_K_M.gguf LFS Q4	4.58 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q4_K_S.gguf LFS Q4	4.37 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q5_0.gguf LFS Q5	5.21 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q5_1.gguf LFS Q5	5.65 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q5_K.gguf LFS Q5	5.34 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q5_K_M.gguf LFS Q5	5.34 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q5_K_S.gguf LFS Q5	5.21 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q6_K.gguf LFS Q6	6.14 GB	Download
LLaMAntino-3-ANITA-8B-Inst-DPO-ITA.Q8_0.gguf LFS Q8	7.95 GB	Download

📊 Model Information

🆔 Model ID: RichardErkhov/swap-uniba_-_LLaMAntino-3-ANITA-8B-Inst-DPO-ITA-gguf

📅 Created: 2 years ago

🔄 Last Updated: 2 years ago

📥 Downloads: 2.3K

❤️ Likes: 0

🎯 Difficulty: Advanced

⚙️ Quantization: Q3, Q4, Q2, Q5, Q6, Q8

🏷️ Tags

ggufarxiv:2405.07101arxiv:2312.09993endpoints_compatibleregion:usconversational

🔗 Related Links

🤗 Visit HuggingFace ⚡ Quick Download