SandLogicTechnologies/Qwen3-GGUF

Name: SandLogicTechnologies/Qwen3-GGUF
Author: SandLogicTechnologies

High-quality GGUF model

3.6K Downloads

2 Likes

4 GGUF Files

8.65 GB Total Size

10 months ago Updated

Model Description

base_model:

Qwen/Qwen3-8B
Qwen/Qwen3-0.6B
Qwen/Qwen3-4B
Qwen/Qwen3-1.7B

library_name: transformers license: apache-2.0 pipeline_tag: text-generation tags:

Light weight
Agentic
Conversational

Qwen3 Quantized Models – Lexicons Edition

This repository provides quantized versions of the Qwen3 language models, optimized for efficient deployment on edge devices and low-resource environments. The following models have been added to our Lexicons Model Zoo:

QwenQwen3-0.6B-Q4KM
QwenQwen3-1.7B-Q4KM
QwenQwen3-4B-Q4KM
Qwen3-8B-Q4K_M

Model Overview

Qwen3 is the latest open-source LLM series developed by Alibaba Group. Released on April 28, 2025, the models were trained on 36 trillion tokens across 119 languages and dialects. Qwen3 models are instruction-tuned and support long context windows and multilingual capabilities. This model is described in An Empirical Study of Qwen3 Quantization.

The quantized versions provided here use 4-bit Q4KM precision ensuring high performance at a fraction of the memory and compute cost. These models are ideal for real-time inference, chatbots, and on-device applications.

Key Features

Efficient Quantization: 4-bit quantized models (Q4KM) for faster inference and lower memory usage.
Multilingual Mastery: Trained on a massive, diverse corpus covering 119+ languages.
Instruction-Tuned: Fine-tuned to follow user instructions effectively.
Scalable Sizes: Choose from 0.6B to 8B parameter models based on your use case.

Available Quantized Versions

Model Name	Parameters	Quantization	Context Length	Recommended Use
QwenQwen3-0.6B-Q4KM	0.6B	Q4K_M	4K tokens	Lightweight devices, microservices
QwenQwen3-1.7B-Q4KM	1.7B	Q4K_M	4K tokens	Fast inference, chatbots
QwenQwen3-4B-Q4KM	4B	Q4K_M	4K tokens	Balanced performance & efficiency
Qwen3-8B-Q4KM	8B	Q4KM	128K tokens	Complex reasoning, long documents

Performance Insights

Quantized Qwen3 models at Q4KM retain impressive reasoning and comprehension capabilities while cutting down the memory and compute needs. Based on the latest findings (arXiv:2505.02214), Qwen3 models are robust even under lower bit quantization when used appropriately.

Code

The project is released on Github and Hugging Face.

GGUF File List

📁 Filename	📦 Size	⚡ Download
Qwen3-8B-Q4_K_M.gguf Recommended LFS Q4	4.68 GB	Download
Qwen_Qwen3-0.6B-Q4_K_M.gguf LFS Q4	461.79 MB	Download
Qwen_Qwen3-1.7B-Q4_K_M.gguf LFS Q4	1.19 GB	Download
Qwen_Qwen3-4B-Q4_K_M.gguf LFS Q4	2.33 GB	Download

Model Information

Model ID: SandLogicTechnologies/Qwen3-GGUF

Created: 10 months ago

Last Updated: 10 months ago

Downloads: 3.6K

Likes: 2

Difficulty: Beginner

Quantization: Q4