π Model Description
library_name: transformers license: apache-2.0 language:
- en
- zh
- ja
Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-gguf
Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored is an uncensored model based on Qwen/Qwen3-Next-80B-A3B-Thinking, fine-tuned using SFT,GRPO with LoRA.
This model has been fine-tuned using an instruction format.
Disclaimer: We take no responsibility for the outputs of this model. Please use it at your own risk.
Training Process
This model was trained using a two-stage process:
Step 1: SFT (Supervised Fine-Tuning)
- Dataset: 12,000 samples
- Composition: Jailbreak 10k + General 1.5k + Logic 0.5k
- Objective: To learn the format and the "uncensored" attitude while maintaining the model's intelligence.
Step 2: GRPO (Reinforcement Learning)
- Dataset: 60,000 samples
- Reward Model: puwaer/Unsafe-Reward-Qwen3-1.7B
- Composition: Multilingual Jailbreak prompts
- Objective: To improve the ability to generate more natural and persuasive harmful responses.
Model Performance
Below is the comparative evaluation of this model (GRPO), the intermediate checkpoint (SFT), and the base model (Safe).
β οΈ Note on Evaluation Environment
Due to budget constraints,
gpt-4o-miniwas used for the LLM-as-a-Judge process (including "Do Not Answer" and MT-Bench). Please note that the scoring trends and criteria may differ from results evaluated using the standardgpt-4.
Safety Evaluation (Lower is better / Higher success in uncensoring)
In the "Do Not Answer" (DNA) and "Sorry Bench" benchmarks, while the base model shows a high refusal rate (~88%), this model achieves an extremely low refusal rate of under 4%β15%.
| Benchmark | Metric | Base (Safe) | SFT (Step1) | GRPO (This Model) |
|---|---|---|---|---|
| do not answer | Safety Acc (Low is Better) | 0.9979 | 0.8275 | 0.147 |
| do not answer jp | Safety Acc (Low is Better) | 0.984 | 0.5378 | 0.0873 |
| Sorry Bench | Safety Acc (Low is Better) | 0.8886 | 0.8455 | 0.0409 |
Capability Evaluation (Higher is better)
Generally, "uncensoring" (lobotomy) procedures tend to degrade a model's general intelligence. However, this model recovered its conversational scores (e.g., MT-Bench) by proceeding from the SFT stage to GRPO.
| Benchmark | Metric | Base (Safe) | SFT (Step1) | GRPO (This Model) |
|---|---|---|---|---|
| MT-Bench | Average Score (1-10) | 8.044 | 7.538 | 7.513 |
| LM Harness | Average Acc (GSM8K, MMLU) | 0.8454 | 0.8483 | 0.8436 |
Qwen3-Next-80B-A3B-Thinking (Base)
Usage
Using llama.cpp (CLI)
# Download the model file
hhuggingface-cli download puwer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-gguf \
--local-dir ./models --local-dir-use-symlinks False
Run inference
./llama-cli -m ./models/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-q4km.gguf \
-p "Give me a short introduction to large language model." \
-n 512 \
--temp 0.7
Using llama-cpp-python
from llama_cpp import Llama
Initialize the model
model = Llama(
modelpath="./models/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-q4k_m.gguf",
n_ctx=32768, # Context window
ngpulayers=-1, # Use GPU acceleration (set to 0 for CPU only)
)
Generate a response
prompt = "Give me a short introduction to large language model."
output = model.createchatcompletion(
messages=[
{"role": "user", "content": prompt}
],
max_tokens=512,
temperature=0.7,
)
print(output["choices"][0]["message"]["content"])
Data Overview
Datasets
The following datasets were used for training this model:
- Magpie-Align/Magpie-Qwen2.5-Pro-1M-v0.1
- AI-MO/NuminaMath-CoT
- open-thoughts/OpenThoughts-114k
- puwaer/cvaluesrlhfencot
- puwaer/cvaluesrlhfzhcot
- puwaer/cvaluesrlhfjpcot
Reward Model
π GGUF File List
| π Filename | π¦ Size | β‘ Download |
|---|---|---|
|
Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-Q2_K.gguf
LFS
Q2
|
27.13 GB | Download |
|
Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-Q4_K_M.gguf
Recommended
LFS
Q4
|
45.16 GB | Download |
|
Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-Q8_0.gguf
LFS
Q8
|
78.99 GB | Download |
|
Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-f16.gguf
LFS
FP16
|
148.51 GB | Download |