πŸ“‹ Model Description


license: apache-2.0 base_model: mistralai/Mistral-Nemo-Base-2407 tags:
  • generatedfromtrainer
  • axolotl
datasets:
  • cognitivecomputations/Dolphin-2.9
  • teknium/OpenHermes-2.5
  • m-a-p/CodeFeedback-Filtered-Instruction
  • cognitivecomputations/dolphin-coder
  • cognitivecomputations/samantha-data
  • microsoft/orca-math-word-problems-200k
  • Locutusque/function-calling-chatml
  • internlm/Agent-FLAN

Dolphin 2.9.3 Mistral Nemo 12b 🐬

This is the llama.cpp gguf conversion of the original model located here:

https://huggingface.co/cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b

Curated and trained by Eric Hartford and Cognitive Computations

Discord</a>
Discord: https://discord.gg/h3K4XGj2RH

Our appreciation for the sponsors of Dolphin 2.9.3:

This model is based on mistralai/Mistral-Nemo-Base-2407, and is governed by the apache 2.0 license.

The base model has 128K context, and our finetuning used 8192 sequence length.

Dolphin 2.9.3 uses ChatML prompt template format.

example:

<|im_start|>system
You are Dolphin, a helpful AI assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Dolphin-2.9.3 has a variety of instruction following, conversational, and coding skills. It also has initial agentic abilities and supports function calling.

Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.

Dolphin is licensed according to apache 2.0 license. We grant permission for any use, including commercial. Dolphin was trained on data generated from GPT4, among other models.

Evals

TBD

Training

Built with Axolotl

See axolotl config

axolotl version: 0.4.1

base_model: /workspace/models/Mistral-Nemo-Base-2407
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

loadin8bit: false

loadin4bit: true


strict: false

datasets:
- path: /workspace/datasets/dolphin-2.9.3/dolphin201-sharegpt2.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9.3/SystemChatfilteredsharegpt.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9.3/SystemChatmultilingualsharegpt.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9.3/dolphin-coder-translate-sharegpt2.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9.3/dolphin-coder-codegen-sharegpt2.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9.3/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9.3/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9.3/notsamanthanorefusals.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9.3/Orca-Math-resort-unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9.3/agentinstructreact_unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9.3/toolbenchinstructj1s13kunfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9.3/toolbenchnegativeunfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9.3/toolbenchreact10p_unfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9.3/toolbenchtflancot30punfiltered.jsonl
type: sharegpt
conversation: chatml
- path: /workspace/datasets/dolphin-2.9.3/openhermes200k_unfiltered.jsonl
type: sharegpt
conversation: chatml

chat_template: chatml

adapter: qlora


lora_r: 128


lora_alpha: 16


loramodulestosave: [embedtokens, lm_head]


lora_dropout: 0.05


loratargetlinear: true

unfrozen_parameters:

  • ^lmhead.weight$
  • ^model.embedtokens.weight$
  • inputlayernorm
  • model.norm
  • postattentionlayernorm
  • selfattn.rotary_emb

mlp.down_proj layers


  • model.layers.0.mlp.downproj
  • model.layers.1.mlp.downproj
  • model.layers.4.mlp.downproj
  • model.layers.37.mlp.downproj
  • model.layers.24.mlp.downproj
  • model.layers.2.mlp.downproj
  • model.layers.38.mlp.downproj
  • model.layers.35.mlp.downproj
  • model.layers.25.mlp.downproj
  • model.layers.6.mlp.downproj
  • model.layers.22.mlp.downproj
  • model.layers.23.mlp.downproj
  • model.layers.3.mlp.downproj
  • model.layers.21.mlp.downproj
  • model.layers.5.mlp.downproj
  • model.layers.28.mlp.downproj
  • model.layers.20.mlp.downproj
  • model.layers.26.mlp.downproj
  • model.layers.19.mlp.downproj
  • model.layers.34.mlp.downproj

mlp.gate_proj layers


  • model.layers.2.mlp.gateproj
  • model.layers.1.mlp.gateproj
  • model.layers.3.mlp.gateproj
  • model.layers.5.mlp.gateproj
  • model.layers.4.mlp.gateproj
  • model.layers.35.mlp.gateproj
  • model.layers.36.mlp.gateproj
  • model.layers.37.mlp.gateproj
  • model.layers.38.mlp.gateproj
  • model.layers.34.mlp.gateproj
  • model.layers.33.mlp.gateproj
  • model.layers.8.mlp.gateproj
  • model.layers.32.mlp.gateproj
  • model.layers.6.mlp.gateproj
  • model.layers.28.mlp.gateproj
  • model.layers.26.mlp.gateproj
  • model.layers.30.mlp.gateproj
  • model.layers.23.mlp.gateproj
  • model.layers.29.mlp.gateproj
  • model.layers.27.mlp.gateproj

mlp.up_proj layers


  • model.layers.3.mlp.upproj
  • model.layers.4.mlp.upproj
  • model.layers.6.mlp.upproj
  • model.layers.2.mlp.upproj
  • model.layers.5.mlp.upproj
  • model.layers.8.mlp.upproj
  • model.layers.10.mlp.upproj
  • model.layers.9.mlp.upproj
  • model.layers.7.mlp.upproj
  • model.layers.0.mlp.upproj
  • model.layers.17.mlp.upproj
  • model.layers.15.mlp.upproj
  • model.layers.22.mlp.upproj
  • model.layers.18.mlp.upproj
  • model.layers.16.mlp.upproj
  • model.layers.11.mlp.upproj
  • model.layers.21.mlp.upproj
  • model.layers.23.mlp.upproj
  • model.layers.20.mlp.upproj
  • model.layers.27.mlp.upproj

selfattn.kproj layers


  • model.layers.30.selfattn.kproj
  • model.layers.27.selfattn.kproj
  • model.layers.25.selfattn.kproj
  • model.layers.33.selfattn.kproj
  • model.layers.26.selfattn.kproj
  • model.layers.31.selfattn.kproj
  • model.layers.35.selfattn.kproj
  • model.layers.39.selfattn.kproj
  • model.layers.22.selfattn.kproj
  • model.layers.24.selfattn.kproj
  • model.layers.21.selfattn.kproj
  • model.layers.28.selfattn.kproj
  • model.layers.23.selfattn.kproj
  • model.layers.36.selfattn.kproj
  • model.layers.20.selfattn.kproj
  • model.layers.37.selfattn.kproj
  • model.layers.29.selfattn.kproj
  • model.layers.32.selfattn.kproj
  • model.layers.16.selfattn.kproj
  • model.layers.18.selfattn.kproj

selfattn.oproj layers


  • model.layers.7.selfattn.oproj
  • model.layers.6.selfattn.oproj
  • model.layers.9.selfattn.oproj
  • model.layers.5.selfattn.oproj
  • model.layers.27.selfattn.oproj
  • model.layers.26.selfattn.oproj
  • model.layers.4.selfattn.oproj
  • model.layers.31.selfattn.oproj
  • model.layers.8.selfattn.oproj
  • model.layers.16.selfattn.oproj
  • model.layers.3.selfattn.oproj
  • model.layers.10.selfattn.oproj
  • model.layers.18.selfattn.oproj
  • model.layers.33.selfattn.oproj
  • model.layers.17.selfattn.oproj
  • model.layers.32.selfattn.oproj
  • model.layers.30.selfattn.oproj
  • model.layers.2.selfattn.oproj
  • model.layers.15.selfattn.oproj
  • model.layers.11.selfattn.oproj

selfattn.qproj layers


  • model.layers.14.selfattn.qproj
  • model.layers.11.selfattn.qproj
  • model.layers.15.selfattn.qproj
  • model.layers.9.selfattn.qproj
  • model.layers.8.selfattn.qproj
  • model.layers.18.selfattn.qproj
  • model.layers.12.selfattn.qproj
  • model.layers.13.selfattn.qproj
  • model.layers.19.selfattn.qproj
  • model.layers.16.selfattn.qproj
  • model.layers.10.selfattn.qproj
  • model.layers.17.selfattn.qproj
  • model.layers.7.selfattn.qproj
  • model.layers.5.selfattn.qproj
  • model.layers.20.selfattn.qproj
  • model.layers.3.selfattn.qproj
  • model.layers.26.selfattn.qproj
  • model.layers.27.selfattn.qproj
  • model.layers.28.selfattn.qproj
  • model.layers.33.selfattn.qproj

selfattn.vproj layers


  • model.layers.27.selfattn.vproj
  • model.layers.20.selfattn.vproj
  • model.layers.24.selfattn.vproj
  • model.layers.25.selfattn.vproj
  • model.layers.30.selfattn.vproj
  • model.layers.2.selfattn.vproj
  • model.layers.23.selfattn.vproj
  • model.layers.22.selfattn.vproj
  • model.layers.26.selfattn.vproj
  • model.layers.33.selfattn.vproj
  • model.layers.37.selfattn.vproj
  • model.layers.7.selfattn.vproj
  • model.layers.4.selfattn.vproj
  • model.layers.18.selfattn.vproj
  • model.layers.31.selfattn.vproj
  • model.layers.17.selfattn.vproj
  • model.layers.35.selfattn.vproj
  • model.layers.32.selfattn.vproj
  • model.layers.21.selfattn.vproj
  • model.layers.3.selfattn.vproj

datasetpreparedpath: /workspace/axolotl/dolph-2.9.3-nemo-prepared
valsetsize: 0.01
output_dir: /workspace/axolotl/dolphin-2.9.3-mistral-nemo

sequence_len: 8192
sample_packing: true
padtosequence_len: true

wandb_project: dolphin-2.9.3-Mistral-nemo
wandb_watch:
wandbrunid:
wandblogmodel:

gradientaccumulationsteps: 16
microbatchsize: 1
num_epochs: 3
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 5e-6
trainoninputs: false
groupbylength: false
bf16: auto
fp16:
tf32:

gradient_checkpointing: true
gradientcheckpointingkwargs:
use_reentrant: false
earlystoppingpatience:
resumefromcheckpoint:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100

evalsperepoch: 4


evaltablesize:
savesperepoch: 1
savetotallimit: 2
save_steps:
debug:
deepspeed: deepspeedconfigs/zero3bf16.json
weight_decay: 0.1
special_tokens:
eostoken: "<|imend|>"
pad_token: "<pad>"
bos_token: "<s>"
unk_token: "<unk>"
tokens:
- "<|im_start|>"

fsdp:

- full_shard

- auto_wrap

fsdp_config:

fsdplimitall_gathers: true

fsdpsyncmodule_states: true

fsdpoffloadparams: true

fsdpuseorig_params: false

fsdpcpuramefficientloading: true

fsdptransformerlayerclsto_wrap: MixtralSparseMoeBlock

fsdpstatedicttype: FULLSTATE_DICT

fsdpautowrappolicy: TRANSFORMERBASED_WRAP

fsdpshardingstrategy: FULL_SHARD

fsdpforwardprefetch: false

fsdpbackwardprefetch: BACKWARD_PRE


Visualize in Weights & Biases

workspace/axolotl/dolphin-2.9.3-mistral-nemo

This model was trained from scratch on the None dataset.
It achieves the following results on the evaluation set:

  • Loss: 0.5605

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learningrate: 5e-06
  • trainbatchsize: 1
  • evalbatchsize: 1
  • seed: 42
  • distributedtype: multi-GPU
  • numdevices: 8
  • gradientaccumulationsteps: 16
  • totaltrainbatchsize: 128
  • totalevalbatchsize: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lrschedulertype: cosine
  • lrschedulerwarmupsteps: 100
  • num_epochs: 3

Training results

Training LossEpochStepValidation Loss
0.56911.01629830.5734
0.53352.017419680.5609
0.52972.963929010.5605

Framework versions

  • Transformers 4.43.0.dev0
  • Pytorch 2.2.2+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1

Updated GGUF conversions were provided by KoboldAI

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
dolphin-2.9.3-mistral-nemo-12b.F16.gguf
LFS FP16
22.82 GB Download
dolphin-2.9.3-mistral-nemo-12b.Q2_K.gguf
LFS Q2
4.46 GB Download
dolphin-2.9.3-mistral-nemo-12b.Q3_K_L.gguf
LFS Q3
6.11 GB Download
dolphin-2.9.3-mistral-nemo-12b.Q3_K_M.gguf
LFS Q3
5.67 GB Download
dolphin-2.9.3-mistral-nemo-12b.Q3_K_S.gguf
LFS Q3
5.15 GB Download
dolphin-2.9.3-mistral-nemo-12b.Q4_0.gguf
Recommended LFS Q4
6.59 GB Download
dolphin-2.9.3-mistral-nemo-12b.Q4_1.gguf
LFS Q4
7.26 GB Download
dolphin-2.9.3-mistral-nemo-12b.Q4_K_M.gguf
LFS Q4
6.96 GB Download
dolphin-2.9.3-mistral-nemo-12b.Q4_K_S.gguf
LFS Q4
6.63 GB Download
dolphin-2.9.3-mistral-nemo-12b.Q5_0.gguf
LFS Q5
7.93 GB Download
dolphin-2.9.3-mistral-nemo-12b.Q5_1.gguf
LFS Q5
8.61 GB Download
dolphin-2.9.3-mistral-nemo-12b.Q5_K_M.gguf
LFS Q5
8.13 GB Download
dolphin-2.9.3-mistral-nemo-12b.Q5_K_S.gguf
LFS Q5
7.93 GB Download
dolphin-2.9.3-mistral-nemo-12b.Q6_K.gguf
LFS Q6
9.37 GB Download
dolphin-2.9.3-mistral-nemo-12b.Q8_0.gguf
LFS Q8
12.13 GB Download