twinkle-ai/gemma-3-4B-T1-it-GGUF

Name: twinkle-ai/gemma-3-4B-T1-it-GGUF
Author: twinkle-ai

High-quality GGUF model

2.4K 📥 Downloads

3 ❤️ Likes

5 📁 GGUF Files

28.66 GB 💾 Total Size

4 weeks ago 🔄 Last Updated

📋 Model Description

license: gemma language:

base_model: twinkle-ai/gemma-3-4B-T1-it library_name: transformers tags:

Taiwan
SLM
GGUF
agent

datasets:

lianghsun/tw-reasoning-instruct
lianghsun/tw-contract-review-chat
minyichen/tw-instruct-R1-200k
minyichen/twmmR1
minyichen/LongPapermultitaskzhtwR1
nvidia/Nemotron-Instruction-Following-Chat-v1

metrics:

accuracy

model-index:

name: gemma-3-4B-T1-it

results: - task: type: question-answering name: Single Choice Question dataset: name: tmmlu+ type: ikala/tmmluplus config: all split: test revision: c0e8ae955997300d5dbf0e382bf0ba5115f85e8c metrics: - type: accuracy value: 47.44 name: single choice - task: type: question-answering name: Single Choice Question dataset: name: mmlu type: cais/mmlu config: all split: test revision: c30699e metrics: - type: accuracy value: 59.13 name: single choice - task: type: question-answering name: Single Choice Question dataset: name: tw-legal-benchmark-v1 type: lianghsun/tw-legal-benchmark-v1 config: all split: test revision: 66c3a5f metrics: - type: accuracy value: 44.18 name: single choice pipeline_tag: text-generation

Gemma 3 4B T1-it GGUF Collection

GGUF quantized models converted from twinkle-ai/gemma-3-4B-T1-it for use with llama.cpp.

!Gemma3-4B-T1-it

About

Gemma 3 4B T1-it is a small language model fine-tuned on Taiwan-focused datasets, supporting both English and Traditional Chinese. This repository provides multiple quantization formats optimized for different use cases.

Available Models

Model	Size	Use Case
`twinkle-ai-gemma-3-4B-T1-it-BF16.gguf`	Largest	Best quality, highest precision
`twinkle-ai-gemma-3-4B-T1-it-F16.gguf`	Large	High quality, good precision
`twinkle-ai-gemma-3-4B-T1-it-Q8_0.gguf`	Medium	Balanced quality and speed
`twinkle-ai-gemma-3-4b-t1-it-q4km.gguf`	Smallest	Fastest inference, lower memory

Quick Start

Option 1: Using Hugging Face Hub (Recommended)

Install llama.cpp via Homebrew:

brew install llama.cpp

Run inference directly from Hugging Face:

llama-cli --hf-repo thliang01/gemma-3-4B-T1-it-Q8_0-GGUF \
  --hf-file gemma-3-4b-t1-it-q8_0.gguf \
  -p "Your prompt here"

Start as a server:

llama-server --hf-repo thliang01/gemma-3-4B-T1-it-Q8_0-GGUF \
  --hf-file gemma-3-4b-t1-it-q8_0.gguf \
  -c 2048

Option 2: Build from Source

#### Step 1: Clone llama.cpp repository

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

#### Step 2: Build llama.cpp

Basic build (CPU only):

LLAMA_CURL=1 make

Hardware-specific build options:

NVIDIA GPU (Linux):

LLAMACUDA=1 LLAMACURL=1 make

Apple Silicon (Mac):

LLAMAMETAL=1 LLAMACURL=1 make

AMD GPU (ROCm):

LLAMAHIPBLAS=1 LLAMACURL=1 make

#### Step 3: Run inference

./llama-cli --hf-repo thliang01/gemma-3-4B-T1-it-Q8_0-GGUF \
  --hf-file gemma-3-4b-t1-it-q8_0.gguf \
  -p "Your prompt here"

#### Step 4: Start server (optional)

./llama-server --hf-repo thliang01/gemma-3-4B-T1-it-Q8_0-GGUF \
  --hf-file gemma-3-4b-t1-it-q8_0.gguf \
  -c 2048

Advanced Usage

Choosing the Right Model

Select a model based on your needs:

Best Quality: Use BF16 or F16 versions (requires more memory)
Balanced: Use Q80 version (recommended for most users)
Resource Constrained: Use q4k_m version (suitable for devices with limited memory)

Common Parameters

-p "prompt": Your input text for the model to respond to
-c 2048: Context length (maximum number of tokens that can be processed)
--hf-repo: Hugging Face repository name
--hf-file: Model file name to use

Adjusting Generation Parameters

llama-cli --hf-repo thliang01/gemma-3-4B-T1-it-Q8_0-GGUF \
  --hf-file gemma-3-4b-t1-it-q8_0.gguf \
  -p "Your prompt here" \
  --temp 0.7 \
  --top-p 0.9 \
  --repeat-penalty 1.1

Parameter explanations:

--temp: Temperature (0.0-2.0), higher values produce more random output
--top-p: Nucleus sampling parameter (0.0-1.0)
--repeat-penalty: Repetition penalty to avoid repetitive content

Model Information

Base Model: twinkle-ai/gemma-3-4B-T1-it
Languages: English, Traditional Chinese
License: Gemma
Format: GGUF (converted via GGUF-my-repo)

Training Data

Taiwan reasoning and instruction datasets
Contract review and legal documents
Multimodal and long-form content
Instruction-following examples

Benchmarks

TMMLU+: 47.44% accuracy
MMLU: 59.13% accuracy
TW Legal Benchmark: 44.18% accuracy

Troubleshooting

Common Issues

Q: Getting out of memory errors?

A: Try using a smaller quantized version like q4km, or reduce the context length parameter -c.

Q: How can I speed up inference?

Use GPU acceleration (add hardware-specific flags during compilation)
Choose a smaller quantized model (like q4km)
Reduce context length

Q: What prompt format does the model support?

A: This is an instruction-tuned model. Use a clear instruction format, for example:

Please analyze the main clauses of the following contract: [contract content]

Contributing

If you have any questions or suggestions, please feel free to open a discussion in the Hugging Face repository.

Note: On first run, llama.cpp will automatically download the model file from Hugging Face. Please ensure you have a stable internet connection.

📂 GGUF File List

📁 Filename	📦 Size	⚡ Download
gemma-3-4b-t1-it-q8_0.gguf LFS Q8	4.51 GB	Download
twinkle-ai-gemma-3-4B-T1-it-BF16.gguf LFS FP16	8.48 GB	Download
twinkle-ai-gemma-3-4B-T1-it-F16.gguf LFS FP16	8.48 GB	Download
twinkle-ai-gemma-3-4B-T1-it-Q8_0.gguf LFS Q8	4.51 GB	Download
twinkle-ai-gemma-3-4b-t1-it-q4_k_m.gguf Recommended LFS Q4	2.67 GB	Download

📊 Model Information

🆔 Model ID: twinkle-ai/gemma-3-4B-T1-it-GGUF

📅 Created: 2 months ago

🔄 Last Updated: 4 weeks ago

📥 Downloads: 2.4K

❤️ Likes: 3

🎯 Difficulty: Intermediate

⚙️ Quantization: Q8, FP16, Q4

🏷️ Tags

transformersggufTaiwanSLMGGUFagenttext-generationenzhdataset:lianghsun/tw-reasoning-instructdataset:lianghsun/tw-contract-review-chatdataset:minyichen/tw-instruct-R1-200kdataset:minyichen/tw_mm_R1dataset:minyichen/LongPaper_multitask_zh_tw_R1dataset:nvidia/Nemotron-Instruction-Following-Chat-v1base_model:twinkle-ai/gemma-3-4B-T1-itbase_model:quantized:twinkle-ai/gemma-3-4B-T1-itlicense:gemmamodel-indexendpoints_compatibleregion:usconversational

🔗 Related Links

🤗 Visit HuggingFace ⚡ Quick Download