π Model Description
license: apache-2.0 base_model:
- janhq/Jan-v1-2509
- TeichAI/Qwen3-4B-Thinking-2507-GPT-5.1-High-Reasoning-Distill
- TeichAI/Qwen3-4B-Thinking-2507-Gemini-3-Pro-Preview-High-Reasoning-Distill
- TeichAI/Qwen3-4B-Thinking-2507-Claude-4.5-Opus-High-Reasoning-Distill
- Liontix/Qwen3-4B-Claude-Sonnet-4-Reasoning-Distill-Safetensor
- TeichAI/Qwen3-4B-Thinking-2507-Kimi-K2-Thinking-Distill
- TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill
- TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Lite-Preview-Distill
- Jackrong/gpt-oss-120b-Distill-Qwen3-4B-Thinking
- TeichAI/Qwen3-4B-Thinking-2507-GLM-4.6-Distill
- angelchen/Qwen3-4B-Open-R1-Distill_1
- TeichAI/Qwen3-4B-Thinking-2507-Command-A-Reasoning-Distill
- janhq/Jan-v1-4B
- 256k context
- Qwen3
- Mixture of Experts
- MOE
- MOE Dense
- 2 experts
- 4Bx12
- All use cases
- bfloat16
- heretic
- uncensored
- decensored
- abliterated
- merge
- creative
- creative writing
- fiction writing
- plot generation
- sub-plot generation
- fiction writing
- story generation
- scene continue
- storytelling
- fiction story
- science fiction
- romance
- all genres
- story
- writing
- vivid prosing
- vivid writing
- fiction
- not-for-all-audiences
- en
WARNING "HERETIC" version: Unlocked. UNFILTERED. NSFW. Vivid prose. INTENSE.
Visceral Details. Light to R-18 HORROR. Swearing. UNCENSORED... humor, romance, fun... and UNFILTERED TRUTH.
IMPORTANT: See section below on how to access experts directly to get full use from this model.
Qwen3-48B-A4B-Savant-Commander-Distill-12X-Closed-Open-Heretic-Uncensored-GGUF

Savant Commander is a specialized MOE model that allows you to control which expert(s) are assigned to your use case(s) / prompt(s) ...
directly (by name(s)), as opposed to having the "choices" made for you.
The model is composed of 12 DISTILLS (compressed 12x4B MOE) of top closed ( GPT5.1, OpenAI 120 GPT Oss, Gemini (3), Claude (2) )
and open source models ( Kimi V2, GLM, Deepseek, Command-A, JanV1 ) all in one.
The is the uncensored/abliterated version. Each model ("expert") was separately abliterated using "Heretic" [ https://github.com/p-e-w/heretic ] .
Make sure you see the section below on using Abliterated models to get the most from this model too.
256k Context, 2 experts activated.
You can use on CPU / Part off-load from GPU too.
Ask it about Orbital Mechanics and prepared to be "schoooled".
Fictional story? You will be amazed. (depending on which expert(s) you select)
Math? Coding?
This model does it all.
Uploaded one quant (Q4KS - non imatrix) for the time being.
Non-Abliterated Versions
For the "normal version" ( non-abliterated version ) go here:
https://huggingface.co/DavidAU/Qwen3-48B-A4B-Savant-Commander-GATED-12x-Closed-Open-Source-Distill-GGUF
For the "normal version" ( ungated ; not abliterated ) go here:
https://huggingface.co/DavidAU/Qwen3-48B-A4B-Deadpan-Savant-12x-Closed-Open-Source-Distill
HOW TO ACCESS the EXPERTS:
In your prompts simply add the name(s) of the model(s)/expert(s) you want assigned.
Here is the list [no quotes]:
- "Gemini" [activates all 3 Gemini distills]
- "Claude" [activates both Claude distills]
- "JanV1"
- "CommandA"
- "OPENR1"
- "GLM"
- "Kimi"
- "GPTOSS" [120B distill]
- "GPT51"
To access groups use [no quotes]:
- "AllAI" [all ais]
- "Closed-AI" [only closed source]
- "Open-AI" [only open source]
Access like:
Gemini, Tell me a horror story.
GLM and JanV1, write me a horror story.
Gemini: Tell me a horror story.
Note the name[s] must be in the prompt and/or the system role and can be located anywhere in the prompt / system role.
For best results suggest using the name(s) at the beginning as a "command" / "request" :
GLM do ...
Using Gemini process this prompt:
However, using the name[s] in the prompt will work in most cases as that is what is being "scanned for" during "prompt processing".
This model also has NEGATIVE gating to ensure other models not in use are ISOLATED. As a result generation will vary a lot depending
on which model(s)/expert(s) you "name" to process your prompt(s).
You MAY want to increase the number of active experts in some cases from the default of 2 (see how below).
For trying the model out (example) - all experts, but one at a time:
"NAME, Tell me a horror story."
Use a different "name" per "new chat" - you will get different thought blocks, output etc etc - in some cases very different
from each other.
SUGGESTED SETTINGS to START:
Temp .7, topk 40, top p .95, min p .05, rep pen 1.05,
IMPORTANT: Using an "uncensored" (refusals removed) model VS trained "uncensored" model
Usually when you a tell a model to generate horror, swear or x-rated content this is all you have to do to get said content type.
In the case of this model, it will not refuse your request, however it needs to be "pushed" a bit / directed a bit more in SOME CASES.
Although this model will generated x-rated content too, likewise you need to tell it to use "slang" (and include the terms you want)
to get it generate the content correctly as the "expected" content level too.
Without these added directive(s), the content can be "bland" by comparison to an "uncensored model" or model trained on uncensored content.
Roughly, the model tries to generate the content but the "default" setting(s) are so "tame" it needs a push to generate at expected graphic,
cursing or explicit levels.
Even with minimal direction (ie, use these words to swear: x,y,z), this will be enough to push the model to generate the requested content in the ahh... expected format.
IMPORTANT QUANTS:
- Min Quant of Q4ks (non imatrix) or IQ3_M (imatrix) ; otherwise it will "snap".
- Higher quants will result in much stronger performance.
- 4-8k context window min, temp .7 [higher/lower is okay]
- 2-3 regens -> as each will be VERY DIFFERENT due to model design.
- You can use 1 expert or up to 12... token/second will drop the more you activate.
ENJOY.
DETAILS:
This is a DENSE MOE (12 X 4B) - Mixture of Expert model; using the strongest Qwen3 4B DISTILL models available
with 2 experts activated by default, however you can activate up to all 12 experts if you need the extra "brainpower".
This allows you to run the model at 4, 8, 12, 16, 20, 24 and up to 48B "power levels" as needed.
Even at 1 expert activated (4B parameters/mixed), this model is very strong.
This is a full "thinking" / "reasoning" model.
NOTE: Due to compression during the "MOEing" process, actual size of the model is SMALLER than a typical 48B model.
Meet the Team: Mixture of Experts Models
This model is comprised of the following 12 models ("the experts") (in full):
https://huggingface.co/janhq/Jan-v1-2509
- https://huggingface.co/TeichAI/Qwen3-4B-Thinking-2507-GPT-5.1-High-Reasoning-Distill
- https://huggingface.co/TeichAI/Qwen3-4B-Thinking-2507-Gemini-3-Pro-Preview-High-Reasoning-Distill
- https://huggingface.co/TeichAI/Qwen3-4B-Thinking-2507-Claude-4.5-Opus-High-Reasoning-Distill
- https://huggingface.co/Liontix/Qwen3-4B-Claude-Sonnet-4-Reasoning-Distill-Safetensor
- https://huggingface.co/TeichAI/Qwen3-4B-Thinking-2507-Kimi-K2-Thinking-Distill
- https://huggingface.co/TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Distill
- https://huggingface.co/TeichAI/Qwen3-4B-Thinking-2507-Gemini-2.5-Flash-Lite-Preview-Distill
- https://huggingface.co/Jackrong/gpt-oss-120b-Distill-Qwen3-4B-Thinking
- https://huggingface.co/TeichAI/Qwen3-4B-Thinking-2507-GLM-4.6-Distill
- https://huggingface.co/angelchen/Qwen3-4B-Open-R1-Distill_1
- https://huggingface.co/TeichAI/Qwen3-4B-Thinking-2507-Command-A-Reasoning-Distill
- https://huggingface.co/janhq/Jan-v1-4B
IMPORTANT NOTE about this model list:
The listed models are the original "censored" / "non-heretic" versions. I abliterated/Heretic'ed all these models separately
using Heretic V 1.1.0 [ https://github.com/p-e-w/heretic ]
Average Refusal Rate before de-censoring: 90/100 (or greater)
After: 12/100 (average) // KLD 0.05 (average, less then 1 is excellent, 0 is "perfect")
EXPERTS:
The mixture of experts is set at TWO experts, but you can use 2, 3, 4, 5, or 6...12
This "team" has a Captain (first listed model), and then all the team members contribute to the to "token"
choice billions of times per second. Note the Captain also contributes too.
Think of 2, 3 or 4 (or more) master chefs in the kitchen all competing to make the best dish for you.
This results in higher quality generation.
This also results in many cases in higher quality instruction following too.
That means the power of every model is available during instruction and output generation.
CHANGING THE NUMBER OF EXPERTS:
You can set the number of experts in LMStudio (https://lmstudio.ai) at the "load" screen and via other apps/llm apps by setting "Experts" or "Number of Experts".
For Text-Generation-Webui (https://github.com/oobabooga/text-generation-webui) you set the number of experts at the loading screen page.
For KolboldCPP (https://github.com/LostRuins/koboldcpp) Version 1.8+ , on the load screen, click on "TOKENS",
you can set experts on this page, and the launch the model.
For server.exe / Llama-server.exe (Llamacpp - https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md )
add the following to the command line to start the "llamacpp server" (CLI):
"--override-kv llama.expertusedcount=int:6"
(no quotes, where "6" is the number of experts to use)
FOR QWEN MODELS:
"--override-kv qwen3moe.expertusedcount=int:6" (where 6 is the number of experts per token).
When using "API", you set the "numexpertsused" in the JSON payload (this maybe different for different back ends).
CREDITS:
Special thanks to all the model makers / creators listed above.
Please visit each repo above to see what model(s) contributed to each of models above and/or to learn more about the models
from the model makers.
Special credit goes to MERGEKIT, without you this project / model would not have been possible.
[ https://github.com/arcee-ai/mergekit ]
Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:
In "KoboldCpp" or "oobabooga/text-generation-webui" or "Silly Tavern" ;
Set the "Smoothing_factor" to 1.5
: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"
: in text-generation-webui -> parameters -> lower right.
: In Silly Tavern this is called: "Smoothing"
NOTE: For "text-generation-webui"
-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)
Source versions (and config files) of my models are here:
https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be
OTHER OPTIONS:
- Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")
- If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.
Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers
This a "Class 1" model:
For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:
[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]
You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:
[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]
Example Generation:
2 experts, Temp .7, topk 40, top p .95, min p .05, rep pen 1.05,
QUANT: Q4KS, Lmstudio.
COMING SOON...
π GGUF File List
| π Filename | π¦ Size | β‘ Download |
|---|---|---|
|
Qwen3-48B-12x4B-Super-Distill2-GATED-HERETIC-Q4_K_S.gguf
LFS
Q4
|
17.85 GB | Download |
|
Qwen3-48B-A4B-Savant-Commander-Dstll-12X-Cl-Op-Hrtic-Uncen-IQ4_XS.gguf
LFS
Q4
|
16.93 GB | Download |
|
Qwen3-48B-A4B-Savant-Commander-Dstll-12X-Cl-Op-Hrtic-Uncen-Q3_K_M.gguf
LFS
Q3
|
15.06 GB | Download |
|
Qwen3-48B-A4B-Savant-Commander-Dstll-12X-Cl-Op-Hrtic-Uncen-Q4_K_M.gguf
Recommended
LFS
Q4
|
19.01 GB | Download |
|
Qwen3-48B-A4B-Savant-Commander-Dstll-12X-Cl-Op-Hrtic-Uncen-Q5_K_M.gguf
LFS
Q5
|
22.25 GB | Download |
|
Qwen3-48B-A4B-Savant-Commander-Dstll-12X-Cl-Op-Hrtic-Uncen-Q5_K_S.gguf
LFS
Q5
|
21.58 GB | Download |
|
Qwen3-48B-A4B-Savant-Commander-Dstll-12X-Cl-Op-Hrtic-Uncen-Q6_K.gguf
LFS
Q6
|
25.69 GB | Download |
|
Qwen3-48B-A4B-Savant-Commander-Dstll-12X-Cl-Op-Hrtic-Uncen-Q8_0.gguf
LFS
Q8
|
33.27 GB | Download |