Model Description


base_model:
  • Qwen/Qwen3-8B
  • Qwen/Qwen3-0.6B
  • Qwen/Qwen3-4B
  • Qwen/Qwen3-1.7B
library_name: transformers license: apache-2.0 pipeline_tag: text-generation tags:
  • Light weight
  • Agentic
  • Conversational

Qwen3 Quantized Models – Lexicons Edition

This repository provides quantized versions of the Qwen3 language models, optimized for efficient deployment on edge devices and low-resource environments. The following models have been added to our Lexicons Model Zoo:

  • QwenQwen3-0.6B-Q4KM
  • QwenQwen3-1.7B-Q4KM
  • QwenQwen3-4B-Q4KM
  • Qwen3-8B-Q4K_M

Model Overview

Qwen3 is the latest open-source LLM series developed by Alibaba Group. Released on April 28, 2025, the models were trained on 36 trillion tokens across 119 languages and dialects. Qwen3 models are instruction-tuned and support long context windows and multilingual capabilities. This model is described in An Empirical Study of Qwen3 Quantization.

The quantized versions provided here use 4-bit Q4KM precision ensuring high performance at a fraction of the memory and compute cost. These models are ideal for real-time inference, chatbots, and on-device applications.


Key Features

  • Efficient Quantization: 4-bit quantized models (Q4KM) for faster inference and lower memory usage.
  • Multilingual Mastery: Trained on a massive, diverse corpus covering 119+ languages.
  • Instruction-Tuned: Fine-tuned to follow user instructions effectively.
  • Scalable Sizes: Choose from 0.6B to 8B parameter models based on your use case.

Available Quantized Versions

Model NameParametersQuantizationContext LengthRecommended Use
QwenQwen3-0.6B-Q4KM0.6BQ4K_M4K tokensLightweight devices, microservices
QwenQwen3-1.7B-Q4KM1.7BQ4K_M4K tokensFast inference, chatbots
QwenQwen3-4B-Q4KM4BQ4K_M4K tokensBalanced performance & efficiency
Qwen3-8B-Q4KM8BQ4KM128K tokensComplex reasoning, long documents

Performance Insights

Quantized Qwen3 models at Q4KM retain impressive reasoning and comprehension capabilities while cutting down the memory and compute needs. Based on the latest findings (arXiv:2505.02214), Qwen3 models are robust even under lower bit quantization when used appropriately.

Code

The project is released on Github and Hugging Face.

GGUF File List

📁 Filename 📦 Size ⚡ Download
Qwen3-8B-Q4_K_M.gguf
Recommended LFS Q4
4.68 GB Download
Qwen_Qwen3-0.6B-Q4_K_M.gguf
LFS Q4
461.79 MB Download
Qwen_Qwen3-1.7B-Q4_K_M.gguf
LFS Q4
1.19 GB Download
Qwen_Qwen3-4B-Q4_K_M.gguf
LFS Q4
2.33 GB Download