πŸ“‹ Model Description


language:
  • en
library_name: transformers license: apache-2.0 tags:
  • gpt
  • llm
  • large language model
  • h2o-llmstudio
thumbnail: >- https://h2o.ai/etc.clientlibs/h2o/clientlibs/clientlib-site/resources/images/favicon.ico pipeline_tag: text-generation quantized_by: h2oai

h2o-danube3-4b-chat-GGUF

Description

This repo contains GGUF format model files for h2o-danube3-4b-chat quantized using llama.cpp framework.

Table below summarizes different quantized versions of h2o-danube3-4b-chat. It shows the trade-off between size, speed and quality of the models.

NameQuant methodModel sizeMT-Bench AVGPerplexityTokens per second
h2o-danube3-4b-chat-F16.ggufF167.92 GB6.436.17479
h2o-danube3-4b-chat-Q80.ggufQ804.21 GB6.496.17725
h2o-danube3-4b-chat-Q6K.ggufQ6K3.25 GB6.376.20791
h2o-danube3-4b-chat-Q5KM.ggufQ5K_M2.81 GB6.256.24927
h2o-danube3-4b-chat-Q4KM.ggufQ4K_M2.39 GB6.316.37967
h2o-danube3-4b-chat-Q3KM.ggufQ3K_M1.94 GB5.876.991099
h2o-danube3-4b-chat-Q2K.ggufQ2K1.51 GB3.719.421299
Columns in the table are:
  • Name -- model name and link
  • Quant method -- quantization method
  • Model size -- size of the model in gigabytes
  • MT-Bench AVG -- MT-Bench benchmark score. The score is from 1 to 10, the higher, the better
  • Perplexity -- perplexity metric on WikiText-2 dataset. It's reported in a perplexity test from llama.cpp. The lower, the better
  • Tokens per second -- generation speed in tokens per second, as reported in a perplexity test from llama.cpp. The higher, the better. Speed tests are done on a single H100 GPU

Prompt template

<|prompt|>Why is drinking water so healthy?</s><|answer|>

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
h2o-danube3-4b-chat-BF16.gguf
LFS FP16
7.38 GB Download
h2o-danube3-4b-chat-F16.gguf
LFS FP16
7.38 GB Download
h2o-danube3-4b-chat-Q2_K.gguf
LFS Q2
1.41 GB Download
h2o-danube3-4b-chat-Q3_K_M.gguf
LFS Q3
1.81 GB Download
h2o-danube3-4b-chat-Q4_K_M.gguf
Recommended LFS Q4
2.23 GB Download
h2o-danube3-4b-chat-Q5_K_M.gguf
LFS Q5
2.62 GB Download
h2o-danube3-4b-chat-Q6_K.gguf
LFS Q6
3.03 GB Download
h2o-danube3-4b-chat-Q8_0.gguf
LFS Q8
3.92 GB Download