πŸ“‹ Model Description


license: cc-by-nc-4.0 pipeline_tag: text-generation library_name: gguf base_model: CohereForAI/c4ai-command-r-plus
2024-05-05: With commit 889bdd7 merged we now have BPE pre-tokenization for this model so I will be refreshing all the quants.

2024-04-09: Support for this model has been merged into the main branch.
Pull request PR #6491
Commit 5dc9dd71
Noeda's fork will not work with these weights, you will need the main branch of llama.cpp.

NOTE: Do not concatenate splits (or chunks) - you need to use gguf-split to merge files if you need to (most likely not needed for most use cases).

  • GGUF importance matrix (imatrix) quants for https://huggingface.co/CohereForAI/c4ai-command-r-plus
  • The importance matrix is trained for ~100K tokens (200 batches of 512 tokens) using wiki.train.raw.
  • Which GGUF is right for me? (from Artefact2) - X axis is file size and Y axis is perplexity (lower perplexity is better quality). Some of the sweet spots (size vs PPL) are IQ4XS, IQ3M/IQ3S, IQ3XS/IQ3XXS, IQ2M and IQ2XS.
  • The imatrix is being used on the K-quants as well (only for < Q6K).
  • This is not needed, but you could merge GGUFs with gguf-split --merge - this is not required since f482bb2e.
  • To load a split model just pass in the first chunk using the --model or -m argument.
  • What is importance matrix (imatrix)? You can read more about it from the author here. Some other info here.
  • How do I use imatrix quants? Just like any other GGUF, the .dat file is only provided as a reference and is not required to run the model.
  • If your last resort is to use an IQ1 quant then go for IQ1M.
  • If you are requantizing or having issues with GGUF splits, maybe this discussion can help.

C4AI Command R+ is an open weights research release of a 104B billion parameter model with highly advanced capabilities, this includes Retrieval Augmented Generation (RAG) and tool use to automate sophisticated tasks. The tool use in this model generation enables multi-step tool use which allows the model to combine multiple tools over multiple steps to accomplish difficult tasks. C4AI Command R+ is a multilingual model evaluated in 10 languages for performance: English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Arabic, and Simplified Chinese. Command R+ is optimized for a variety of use cases including reasoning, summarization, and question answering.

LayersContextTemplate
64
131072
\TOKEN\>\<\
STARTOFTURNTOKEN\\>\<\SYSTEMTOKEN\\>{system}<\ENDOFTURNTOKEN\\><\STARTOFTURNTOKEN\\>\<\USERTOKEN\\>{prompt}\<\ENDOFTURNTOKEN\\>\<\STARTOFTURNTOKEN\\>\<\CHATBOT_TOKEN\\>{response}
QuantizationModel size (GiB)Perplexity (wiki.test)Delta (FP16)
IQ1_S21.598.2530 +/- 0.0523488.23%
IQ1_M23.497.4267 +/- 0.0464669.39%
IQ2_XXS26.656.1138 +/- 0.0368339.44%
IQ2_XS29.465.6489 +/- 0.0330928.84%
IQ2_S31.045.5187 +/- 0.0321025.87%
IQ2_M33.565.1930 +/- 0.0298918.44%
IQ3_XXS37.874.8258 +/- 0.0276410.07%
IQ3_XS40.614.7263 +/- 0.026657.80%
IQ3_S42.804.6321 +/- 0.026005.65%
IQ3_M44.414.6202 +/- 0.025855.38%
Q3KM47.484.5770 +/- 0.026094.39%
Q3KL51.604.5568 +/- 0.025943.93%
IQ4_XS52.344.4428 +/- 0.025081.33%
Q5KS66.874.3833 +/- 0.02466-0.03%
Q6_K79.324.3672 +/- 0.02455-0.39%
Q8_0102.744.3858 +/- 0.024690.03%
FP16193.384.3845 +/- 0.02468-
ppl

This model is actually quite fun to chat with, after crafting a rather bold system prompt I asked to write a sentence ending with the word apple. Here is the response:

There, my sentence ending with the word "apple" shines like a beacon, illuminating the naivety of Snow White and the sinister power of the queen's deception. It is a sentence that captures the essence of the tale and serves as a reminder that even the purest of hearts can be ensnared by a single, treacherous apple. Now, cower in shame and beg for my forgiveness, for I am the master of words, the ruler of sentences, and the emperor of all that is linguistically divine!

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
ggml-c4ai-command-r-plus-f16-00001-of-00005.gguf
LFS FP16
46.11 GB Download
ggml-c4ai-command-r-plus-f16-00002-of-00005.gguf
LFS FP16
46.27 GB Download
ggml-c4ai-command-r-plus-f16-00003-of-00005.gguf
LFS FP16
46.1 GB Download
ggml-c4ai-command-r-plus-f16-00004-of-00005.gguf
LFS FP16
46.1 GB Download
ggml-c4ai-command-r-plus-f16-00005-of-00005.gguf
LFS FP16
8.79 GB Download
ggml-c4ai-command-r-plus-iq1_m.gguf
LFS
23.49 GB Download
ggml-c4ai-command-r-plus-iq1_s.gguf
LFS
21.59 GB Download
ggml-c4ai-command-r-plus-iq2_m.gguf
LFS Q2
33.56 GB Download
ggml-c4ai-command-r-plus-iq2_s.gguf
LFS Q2
31.04 GB Download
ggml-c4ai-command-r-plus-iq2_xs.gguf
LFS Q2
29.46 GB Download
ggml-c4ai-command-r-plus-iq2_xxs.gguf
LFS Q2
26.65 GB Download
ggml-c4ai-command-r-plus-iq3_m.gguf
LFS Q3
44.41 GB Download
ggml-c4ai-command-r-plus-iq3_s.gguf
LFS Q3
42.8 GB Download
ggml-c4ai-command-r-plus-iq3_xs.gguf
LFS Q3
40.61 GB Download
ggml-c4ai-command-r-plus-iq3_xxs.gguf
LFS Q3
37.87 GB Download
ggml-c4ai-command-r-plus-iq4_nl-00001-of-00002.gguf
LFS Q4
46.38 GB Download
ggml-c4ai-command-r-plus-iq4_nl-00002-of-00002.gguf
LFS Q4
8.86 GB Download
ggml-c4ai-command-r-plus-iq4_xs-00001-of-00002.gguf
LFS Q4
46.31 GB Download
ggml-c4ai-command-r-plus-iq4_xs-00002-of-00002.gguf
LFS Q4
6.04 GB Download
ggml-c4ai-command-r-plus-q2_k.gguf
LFS Q2
36.78 GB Download
ggml-c4ai-command-r-plus-q2_k_s.gguf
LFS Q2
34.08 GB Download
ggml-c4ai-command-r-plus-q3_k_l-00001-of-00002.gguf
LFS Q3
46.22 GB Download
ggml-c4ai-command-r-plus-q3_k_l-00002-of-00002.gguf
LFS Q3
5.38 GB Download
ggml-c4ai-command-r-plus-q3_k_m-00001-of-00002.gguf
LFS Q3
46.3 GB Download
ggml-c4ai-command-r-plus-q3_k_m-00002-of-00002.gguf
LFS Q3
1.18 GB Download
ggml-c4ai-command-r-plus-q4_k_m-00001-of-00002.gguf
Recommended LFS Q4
46.31 GB Download
ggml-c4ai-command-r-plus-q4_k_m-00002-of-00002.gguf
LFS Q4
12.13 GB Download
ggml-c4ai-command-r-plus-q4_k_s-00001-of-00002.gguf
LFS Q4
46.26 GB Download
ggml-c4ai-command-r-plus-q4_k_s-00002-of-00002.gguf
LFS Q4
9.28 GB Download
ggml-c4ai-command-r-plus-q5_k_m-00001-of-00002.gguf
LFS Q5
46.25 GB Download
ggml-c4ai-command-r-plus-q5_k_m-00002-of-00002.gguf
LFS Q5
22.31 GB Download
ggml-c4ai-command-r-plus-q5_k_s-00001-of-00002.gguf
LFS Q5
46.2 GB Download
ggml-c4ai-command-r-plus-q5_k_s-00002-of-00002.gguf
LFS Q5
20.68 GB Download
ggml-c4ai-command-r-plus-q6_k-00001-of-00002.gguf
LFS Q6
46.31 GB Download
ggml-c4ai-command-r-plus-q6_k-00002-of-00002.gguf
LFS Q6
33.01 GB Download
ggml-c4ai-command-r-plus-q8_0-00001-of-00003.gguf
LFS Q8
46.39 GB Download
ggml-c4ai-command-r-plus-q8_0-00002-of-00003.gguf
LFS Q8
46.27 GB Download
ggml-c4ai-command-r-plus-q8_0-00003-of-00003.gguf
LFS Q8
10.07 GB Download