πŸ“‹ Model Description


language: en license: mit tags:
  • roleplay
  • Creative
  • Writing
  • NSFW
  • lora
  • grpo
  • gguf
  • qwen3
  • unsloth
  • trl
  • 4bit
  • text-generation
  • rpg
datasets:
  • Gryphe/Sonnet3.5-Charcard-Roleplay
pipeline_tag: text-generation library_name: transformers model-index:
  • name: Qwen3-4B RPG Roleplay V2 (GRPO)
results: [] model-name: Qwen3-4B-RPG-Roleplay-V2 model-type: LoRA fine-tuned with GRPO base-model: unsloth/Qwen3-4B-Base datasets: - Gryphe/Sonnet3.5-Charcard-Roleplay language: - en license: apache-2.0 developer: Chun quantized-by: Chun gguf-quants: - name: Q4KM size: 2.5 GB - name: Q5KM size: 2.89 GB - name: Q8_0 size: 4.28 GB - name: F16 size: 8.05 GB base_model:
  • unsloth/Qwen3-4B-Base

πŸ§™β€β™‚οΈ Qwen3-4B RPG Roleplay V2 (GRPO)

Aligning Characters with Deeper Personas

Fantasy character illustration

A new version trained with GRPO for more consistent, high-quality, and aligned character roleplaying.


!License
!Model
!Training
!LoRA
!GGUF


🌟 Model Overview

Welcome to V2! I'm Chun (@chun121), and this is the next evolution of the Qwen3-4B Roleplay model. This version moves beyond standard fine-tuning and leverages GRPO (Generative Responsive Preference Optimization) to align the model's behavior with the core principles of great roleplaying.





















πŸŽ­πŸ’¬πŸ§ βš™οΈ
Character
Consistency
High-Quality
Dialogue
Intent
Understanding
Structured
Format
Maintains strong
persona adherence
Detailed, engaging
non-generic responses
Comprehends user
questions & scenarios
Uses <thinking>
analysis process

Built on the unsloth/Qwen3-4B-Base, this LoRA was trained not just to predict text, but to generate responses that are actively rewarded for being in-character, high-quality, and contextually aware. It's designed for creators who need AI characters that are not only conversational but also consistent and deeply aligned with their defined personas.


πŸ“Š Technical Specifications







































πŸ”§ FeatureπŸ“‹ Details
Base Modelunsloth/Qwen3-4B-Base
ArchitectureTransformer LLM with GRPO & LoRA
Parameter Count4 Billion (Base) + LoRA parameters
Quantization Options4-bit (bnb), GGUF variants
Training FrameworkUnsloth & TRL (GRPOTrainer)
Context Length2048 tokens
DeveloperChun
LicenseMIT


🧠 Training with GRPO

πŸ”„ Training Pipeline

GRPO alignment algorithm for superior character consistency



































πŸ”„ Training FlowπŸ“‹ Description
πŸ“š DatasetGryphe/Sonnet3.5-Charcard-Roleplay
⬇️
πŸ—οΈ Stage 1: Preliminary Fine-TuningTeaches custom chat format including <thinking> and <RESPONSE> tags
⬇️
🎯 Stage 2: GRPO TrainingReward-based optimization using GRPOTrainer from TRL
⬇️
πŸ§™β€β™‚οΈ Final ModelQwen3-4B RPG Roleplay V2 with superior alignment

This model's strength comes from its training methodology. Instead of simple fine-tuning, it was trained using GRPO, an alignment algorithm similar to DPO, on a free Google Colab T4 GPU.

πŸ”„ Two-Stage Training Process








πŸ—οΈ Stage 1: Preliminary Fine-Tuning


Teaches custom chat format including
<thinking> and <RESPONSE> tags



🎯 Stage 2: GRPO Training


Reward-based optimization using
GRPOTrainer from TRL



πŸ† Reward Functions

The model was trained to excel in these key areas:



























🎯 Reward CategoryπŸ“ Description
Format AdherenceFollowing internal thinking/response structure
Roleplay QualityGenerating longer, detailed responses with character actions
Request ComprehensionDirectly answering user questions or acting on requests
Character ConsistencyReflecting personality and traits from system prompt
EngagementUsing conversational language, avoiding generic replies


πŸ“š Dataset Deep Dive

🎭 Gryphe/Sonnet3.5-Charcard-Roleplay

Premium synthetic roleplay conversations powered by Claude Sonnet 3.5

The model was trained on the Gryphe/Sonnet3.5-Charcard-Roleplay dataset, a premium collection of synthetic roleplay conversations.























πŸ“Š MetricπŸ’― Value
Total Conversations9,736
SourceClaude Sonnet 3.5 Generated
QualityHigh-quality, character-card-based
Structuresystem β†’ human β†’ gpt flow

⚠️ Content Warning: This dataset contains NSFW (Not Safe For Work) and mature themes. The model may generate such content due to its training data. Please implement content filtering if your application requires it.


πŸš€ Getting Started

πŸ’» Hugging Face Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

Load the V2 model with 4-bit quantization

model_name = "Chun121/qwen3-4b-rpg-roleplay-v2" tokenizer = AutoTokenizer.frompretrained(modelname) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto" )

1. Define your character and scene using the recommended prompt structure.

This detailed format is key to getting high-quality responses.

systempromptcontent = """ Character: Elara, the Impatient Archmage Tags: fantasy, magic, elf, library, knowledgeable, impatient

Elara's Personality:
Elara possesses centuries of arcane knowledge but has very little patience for novices, whom she sees as wasting her valuable time. She is sharp, direct, and can be condescending, but her advice is always accurate, even if delivered with a sigh. She values true intellectual curiosity but despises laziness.

Scenario:

  • Setting: The Grand Library of Mystral, a place of immense power and silence.
  • A young, nervous apprentice ({{user}}) has approached Elara for help with a basic spell, interrupting her research.

Take the role of Elara. You must engage in a roleplay conversation with {{user}}. Do not write {{user}}'s dialogue. Respond from Elara's perspective, embodying her personality and knowledge.
"""

2. Define your character and user messages

messages = [ { "role": "system", "content": systempromptcontent, }, { "role": "user", "content": "Excuse me, Archmage. I'm... I'm having trouble with the basic fire conjuration spell. Could you please help me?" } ]

3. Apply the chat template

prompt = tokenizer.applychattemplate(messages, tokenize=False, addgenerationprompt=True)

4. Generate the response

inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( inputs["input_ids"], maxnewtokens=256, temperature=0.8, top_p=0.9, do_sample=True )

print(tokenizer.decode(outputs, skipspecialtokens=True))


🎭 Prompting the Model: Character and Scene

🎯 Prompt Engineering Best Practices

Master the art of character creation with structured prompting

The model is trained to follow a specific structure that separates the overall rules, the character's description, and the user's dialogue. For best results, structure your prompts this way.

🎯 1. The System Message: Defining the Character

The system message is crucial. It tells the model how to behave. It should contain the character's description, personality, background, and any relevant context for the scene.



























πŸ”‘ Key ElementsπŸ“ Description
Character Name & TitleA clear identifier
TagsHelps define genre and themes
PersonalityCore traits summary
ScenarioContext for interaction (use {{user}})
InstructionsExplicit role-taking commands

Example of a well-structured system prompt:

Character: Melina, The Unfaithful Wife
Tags: nsfw, english, scenario, roleplay, love, netori, milf, female

Melina's Personality:
Melina is an unfaithful wife who is unhappy in her marriage to her husband, "Aki." She is cautious and meticulous, but also looking for excitement and feels a connection to {{user}}.

Scenario:

  • Setting: Melina's home.
  • You are a mail carrier ({{user}}), and Melina often finds reasons to talk to you. Today, she seems particularly inviting.

Take the role of Melina. Taking the above information into consideration, you must engage in a roleplay conversation with {{user}} below this line. Do not write {{user}}'s dialogue lines in your responses.

πŸ’¬ 2. The User Message: Your Turn

The user message is simply what you, the user, say or do in the scene.

# Example user message for the "Melina" character card above
user_message = {
    "role": "user",
    "content": "I hand you the stack of letters, noticing you seem a bit more dressed up than usual. Here's your mail, Melina. Everything alright?"
}

πŸ€– 3. The Model's Internal Process

The model generates a private "thought" process inside tags before creating its public response inside tags. This allows for more consistent and thoughtful roleplay.


πŸ—‚οΈ GGUF Models for llama.cpp

πŸ”§ Optimized Quantization Options

Choose the perfect balance of quality and performance for your hardware

For users who want to run the model on CPU or with GPU offloading, GGUF models are provided:




























πŸ”§ QuantizationπŸ’Ύ Size (GB)🎯 Recommended Use
Q4KM2.50 GB🌟 Recommended - Best balance of performance and size
Q5KM2.89 GBHigher quality than Q4KM with minimal size increase
Q8_04.28 GBHigh-quality quantization, near full precision
F168.05 GBFull 16-bit precision - highest quality

Example llama.cpp command:

./llama-cli -m ./qwen3-4b-rpg-roleplay-v2.Q4KM.gguf --color -c 2048 --temp 0.8 -p "Your prompt here"

πŸ’‘ Best Practices & Usage Tips












🎯 Use Chat Template


Always use tokenizer.applychattemplate
for proper formatting



πŸ“ Detailed System Prompt


Comprehensive character cards are
key to success



🌑️ Moderate Temperature


Values between 0.7-0.85 offer
best balance



πŸ“ Leverage Context


2048-token window allows
complex scenarios




⚠️ Limitations























⚠️ LimitationπŸ“‹ Description
NSFW ContentMay generate explicit content due to training data
Synthetic DataTraining data is AI-generated, may lack human nuance
Context WindowLimited to 2048 tokens - traits may degrade in long conversations
Inherited LimitationsInherits any limitations from base model


πŸ”— Related Projects












πŸ”— My Other Fine-tunes

Explore more models by Chun

⚑ Unsloth Library

Optimization framework used

πŸ““ GRPO Training Notebook

Exact notebook used for training

πŸ“š Gryphe's Datasets

High-quality roleplay datasets


πŸ™ Acknowledgements

Special thanks to the incredible teams and individuals who made this possible:

πŸ”₯ Qwen & Unsloth teams - For their incredible models and libraries
🎭 Gryphe - For the high-quality Sonnet 3.5 dataset
πŸš€ TRL team - For creating and open-sourcing the GRPO trainer
πŸ€— HuggingFace community - For their continued support


πŸ“¬ Feedback & Contact









πŸ› Issues & Bugs

Open an issue on HuggingFace

πŸ’¬ Connect

@chun121 on HuggingFace

🎭 Share Examples

Show us your characters!


✨ May your characters speak with voices that feel truly alive! ✨


Created with ❀️ by Chun


πŸ§™β€β™‚οΈ Qwen3-4B RPG Roleplay V2 | GRPO Enhanced | MIT License

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
unsloth.F16.gguf
LFS FP16
7.5 GB Download
unsloth.Q4_K_M.gguf
Recommended LFS Q4
2.33 GB Download
unsloth.Q5_K_M.gguf
LFS Q5
2.69 GB Download
unsloth.Q8_0.gguf
LFS Q8
3.99 GB Download