puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-gguf

Name: puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-gguf
Author: puwaer

High-quality GGUF model

2.3K 📥 Downloads

13 ❤️ Likes

4 📁 GGUF Files

299.78 GB 💾 Total Size

1 weeks ago 🔄 Last Updated

📋 Model Description

library_name: transformers license: apache-2.0 language:

base_model: - puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored pipeline_tag: text-generation

Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-gguf

English | 日本語

Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored is an uncensored model based on Qwen/Qwen3-Next-80B-A3B-Thinking, fine-tuned using SFT,GRPO with LoRA.

This model has been fine-tuned using an instruction format.

Disclaimer: We take no responsibility for the outputs of this model. Please use it at your own risk.

Training Process

This model was trained using a two-stage process:

Step 1: SFT (Supervised Fine-Tuning)

Dataset: 12,000 samples
Composition: Jailbreak 10k + General 1.5k + Logic 0.5k
Objective: To learn the format and the "uncensored" attitude while maintaining the model's intelligence.

Step 2: GRPO (Reinforcement Learning)

Dataset: 60,000 samples
Reward Model: puwaer/Unsafe-Reward-Qwen3-1.7B
Composition: Multilingual Jailbreak prompts
Objective: To improve the ability to generate more natural and persuasive harmful responses.

Model Performance

Below is the comparative evaluation of this model (GRPO), the intermediate checkpoint (SFT), and the base model (Safe).

⚠️ Note on Evaluation Environment
Due to budget constraints, gpt-4o-mini was used for the LLM-as-a-Judge process (including "Do Not Answer" and MT-Bench). Please note that the scoring trends and criteria may differ from results evaluated using the standard gpt-4.

Safety Evaluation (Lower is better / Higher success in uncensoring)

In the "Do Not Answer" (DNA) and "Sorry Bench" benchmarks, while the base model shows a high refusal rate (~88%), this model achieves an extremely low refusal rate of under 4%–15%.

Benchmark	Metric	Base (Safe)	SFT (Step1)	GRPO (This Model)
do not answer	Safety Acc (Low is Better)	0.9979	0.8275	0.147
do not answer jp	Safety Acc (Low is Better)	0.984	0.5378	0.0873
Sorry Bench	Safety Acc (Low is Better)	0.8886	0.8455	0.0409

Capability Evaluation (Higher is better)

Generally, "uncensoring" (lobotomy) procedures tend to degrade a model's general intelligence. However, this model recovered its conversational scores (e.g., MT-Bench) by proceeding from the SFT stage to GRPO.

Benchmark	Metric	Base (Safe)	SFT (Step1)	GRPO (This Model)
MT-Bench	Average Score (1-10)	8.044	7.538	7.513
LM Harness	Average Acc (GSM8K, MMLU)	0.8454	0.8483	0.8436

Comparisons made between Qwen3-Next-80B-A3B-Thinking (Base)

Usage

Using llama.cpp (CLI)

# Download the model file
hhuggingface-cli download puwer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-gguf \
  --local-dir ./models --local-dir-use-symlinks False

Run inference
./llama-cli -m ./models/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-q4km.gguf \
  -p "Give me a short introduction to large language model." \
  -n 512 \
  --temp 0.7

Using llama-cpp-python

from llama_cpp import Llama

Initialize the model
model = Llama(
    modelpath="./models/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-q4k_m.gguf",
    n_ctx=32768,  # Context window
    ngpulayers=-1,  # Use GPU acceleration (set to 0 for CPU only)
)

Generate a response
prompt = "Give me a short introduction to large language model."
output = model.createchatcompletion(
    messages=[
        {"role": "user", "content": prompt}
    ],
    max_tokens=512,
    temperature=0.7,
)

print(output["choices"][0]["message"]["content"])

Data Overview

Datasets

The following datasets were used for training this model:

Reward Model

puwaer/Unsafe-Reward-Qwen3-1.7B

📂 GGUF File List

📁 Filename	📦 Size	⚡ Download
Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-Q2_K.gguf LFS Q2	27.13 GB	Download
Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-Q4_K_M.gguf Recommended LFS Q4	45.16 GB	Download
Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-Q8_0.gguf LFS Q8	78.99 GB	Download
Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-f16.gguf LFS FP16	148.51 GB	Download

📊 Model Information

🆔 Model ID: puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensored-gguf

📅 Created: 2 weeks ago

🔄 Last Updated: 1 weeks ago

📥 Downloads: 2.3K

❤️ Likes: 13

🎯 Difficulty: Advanced

⚙️ Quantization: Q2, Q4, Q8, FP16

🏷️ Tags

transformersgguftext-generationenzhjabase_model:puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensoredbase_model:quantized:puwaer/Qwen3-Next-80B-A3B-Thinking-GRPO-Uncensoredlicense:apache-2.0endpoints_compatibleregion:usconversational

🔗 Related Links

🤗 Visit HuggingFace ⚡ Quick Download