πŸ“‹ Model Description

Good and Small models for Mobile Devices

Try them out in

!PrivacyAIIcon Privacy AI
on the App Store.

Privacy AI is a lightweight, serverless application. All tools - including web search, stock quotes, and Health analysis - run on-device, keeping data and actions fully private. It supports both local AI models and connections to your own OpenAI-compatible servers.

Refer more information on Privacy AI Official Site:

Qwen3 4B Instruct 2507

Qwen3-4B-Instruct-2507 is the latest 4B parameter model in the Qwen3 series, featuring significant improvements in reasoning, mathematics, science, coding, and tool usage. With 262K context length and strong multilingual support, it excels at instruction following, logical reasoning, and complex problem-solving tasks.

Model Intention: Latest Qwen3-4B Instruct model with enhanced reasoning, logical thinking, mathematics, science, coding, and tool usage capabilities

Model URL: https://huggingface.co/flyingfishinwater/goodandsmallmodels/resolve/main/Qwen3-4B-Instruct-2507-Q4_0.gguf

Model Info URL: https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507

Model License: License Info

Model Description: Qwen3-4B-Instruct-2507 is the latest 4B parameter model in the Qwen3 series, featuring significant improvements in reasoning, mathematics, science, coding, and tool usage. With 262K context length and strong multilingual support, it excels at instruction following, logical reasoning, and complex problem-solving tasks.

Developer: https://huggingface.co/Qwen

File Size: 2400 MB

Context Length: 2048 tokens

Prompt Format:

Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3 4B Thinking 2507

Qwen3-4B-Thinking-2507 is a specialized variant of the Qwen3-4B series with enhanced reasoning capabilities. It features thinking mode enabled by default, providing significantly improved performance on complex reasoning tasks including logical reasoning, mathematics, science, coding, and academic benchmarks with 262K context length.

Model Intention: Advanced reasoning model with thinking mode enabled for complex logical reasoning, mathematics, science, and coding tasks

Model URL: https://huggingface.co/flyingfishinwater/goodandsmallmodels/resolve/main/Qwen3-4B-Thinking-2507-Q4_0.gguf

Model Info URL: https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507

Model License: License Info

Model Description: Qwen3-4B-Thinking-2507 is a specialized variant of the Qwen3-4B series with enhanced reasoning capabilities. It features thinking mode enabled by default, providing significantly improved performance on complex reasoning tasks including logical reasoning, mathematics, science, coding, and academic benchmarks with 262K context length.

Developer: https://huggingface.co/Qwen

File Size: 2100 MB

Context Length: 2048 tokens

Prompt Format:

Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


GLM Edge 4B Chat

GLM-4 is the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, GLM-4 has shown superior performance beyond Llama-3. In addition to multi-round conversations, GLM-4-Chat also has advanced features such as web browsing, code execution, custom tool calls (Function Call), and long text reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26 languages including Japanese, Korean, and German.

Model Intention: It is the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI

Model URL: https://huggingface.co/flyingfishinwater/goodandsmallmodels/resolve/main/glm-edge-4b-chat.Q4K_M.gguf?download=true

Model Info URL: https://huggingface.co/THUDM

Model License: License Info

Model Description: GLM-4 is the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, GLM-4 has shown superior performance beyond Llama-3. In addition to multi-round conversations, GLM-4-Chat also has advanced features such as web browsing, code execution, custom tool calls (Function Call), and long text reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26 languages including Japanese, Korean, and German.

Developer: https://huggingface.co/THUDM

File Size: 2627 MB

Context Length: 1024 tokens

Prompt Format:

{% for item in messages %}{% if item['role'] == 'system' %}<|system|>
{{ item['content'] }}{% elif item['role'] == 'user' %}<|user|>
{{ item['content'] }}{% elif item['role'] == 'assistant' %}<|assistant|>
{{ item['content'] }}{% endif %}{% endfor %}{% if addgenerationprompt %}<|assistant|>
{% endif %}

Template Name: glm

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Gemma 3n E2B it

Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants. These models were trained with data in over 140 spoken languages.

Model Intention: Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input

Model URL: https://huggingface.co/flyingfishinwater/goodandsmallmodels/resolve/main/gemma-3n-E2B-it-Q4_0.gguf?download=true

Model Info URL: https://huggingface.co/google/gemma-3n-E2B-it

Model License: License Info

Model Description: Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants. These models were trained with data in over 140 spoken languages.

Developer: https://huggingface.co/google

Update Date: 2025-06-27

File Size: 2720 MB

Context Length: 8000 tokens

Prompt Format:

Template Name: chatml

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


SmolLM3 3B

SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale. The model is a decoder-only transformer using GQA and NoPE (with 3:1 ratio), it was pretrained on 11.2T tokens with a staged curriculum of web, code, math and reasoning data. Post-training included midtraining on 140B reasoning tokens.

Model Intention: SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages (English, French, Spanish, German, Italian, and Portuguese), advanced reasoning and long context.

Model URL: https://huggingface.co/flyingfishinwater/goodandsmallmodels/resolve/main/SmolLM3-Q4K_M.gguf?download=true

Model Info URL: https://huggingface.co/HuggingFaceTB/SmolLM3-3B

Model License: License Info

Model Description: SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale. The model is a decoder-only transformer using GQA and NoPE (with 3:1 ratio), it was pretrained on 11.2T tokens with a staged curriculum of web, code, math and reasoning data. Post-training included midtraining on 140B reasoning tokens.

Developer: https://huggingface.co/HuggingFaceTB

File Size: 1920 MB

Context Length: 2048 tokens

Prompt Format:

Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Phi4 mini 4B

Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model is intended for broad multilingual commercial and research use. The model provides uses for general purpose AI systems and applications which require: 1). Memory/compute constrained environments; 2). Latency bound scenarios; 3) Strong reasoning (especially math and logic). The model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features.

Model Intention: Phi-4-mini-instruct is a lightweight model focused on high-quality, reasoning dense data. It supports 128K token context length

Model URL: https://huggingface.co/flyingfishinwater/goodandsmallmodels/resolve/main/Phi-4-mini-instruct-Q4K_M.gguf?download=true

Model Info URL: https://huggingface.co/microsoft/Phi-4-mini-instruct

Model License: License Info

Model Description: Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model is intended for broad multilingual commercial and research use. The model provides uses for general purpose AI systems and applications which require: 1). Memory/compute constrained environments; 2). Latency bound scenarios; 3) Strong reasoning (especially math and logic). The model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features.

Developer: https://huggingface.co/microsoft

File Size: 2020 MB

Context Length: 2048 tokens

Prompt Format:

{% for message in messages %}{% if message['role'] == 'system' and 'tools' in message and message['tools'] is not none %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|tool|>' + message['tools'] + '<|/tool|>' + '<|end|>' }}{% else %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|end|>' }}{% endif %}{% endfor %}{% if addgenerationprompt %}{{ '<|assistant|>' }}{% else %}{{ eos_token }}{% endif %}

Template Name: llama3.2

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3 1.7B

Qwen3 1.7B is one of the small models in the Qwen series, designed for efficiency and speed. It can run seamlessly on edge devices, enabling rapid inference and real-time applications. This compact model is ideal for testing scenarios, prototyping, or deployment in resource-constrained environments.

Model Intention: The 1.7B model in the Qwen3 series is a small model designed for fast predictions and function calls.

Model URL: https://huggingface.co/flyingfishinwater/goodandsmallmodels/resolve/main/Qwen3-1.7B-Q4K_M.gguf?download=true

Model Info URL: https://huggingface.co/Qwen/Qwen3-1.7B

Model License: License Info

Model Description: Qwen3 1.7B is one of the small models in the Qwen series, designed for efficiency and speed. It can run seamlessly on edge devices, enabling rapid inference and real-time applications. This compact model is ideal for testing scenarios, prototyping, or deployment in resource-constrained environments.

Developer: https://huggingface.co/Qwen

File Size: 1110 MB

Context Length: 2048 tokens

Prompt Format:

Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


ERNIE-4.5 0.3B

ERNIE 4.5 is a series of open source models created by Baidu. The advanced capabilities of the ERNIE 4.5 models, particularly the MoE-based A47B and A3B series, are underpinned by several key technical innovations: 1. Multimodal Heterogeneous MoE Pre-Training; 2. Scaling-Efficient Infrastructure; 3. Modality-Specific Post-Training

Model Intention: ERNIE-4.5-0.3B-Base is a text dense Base model for testing the model's architecture.

Model URL: https://huggingface.co/flyingfishinwater/goodandsmallmodels/resolve/main/ERNIE-4.5-0.3B-PT-Q4_0.gguf?download=true

Model Info URL: https://huggingface.co/baidu/ERNIE-4.5-0.3B-Base-PT

Model License: License Info

Model Description: ERNIE 4.5 is a series of open source models created by Baidu. The advanced capabilities of the ERNIE 4.5 models, particularly the MoE-based A47B and A3B series, are underpinned by several key technical innovations: 1. Multimodal Heterogeneous MoE Pre-Training; 2. Scaling-Efficient Infrastructure; 3. Modality-Specific Post-Training

Developer: https://huggingface.co/baidu

File Size: 233 MB

Context Length: 2048 tokens

Prompt Format:

Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


LFM2 1.2B

LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. LFM2 is a new hybrid Liquid model with multiplicative gates and short convolutions. It supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.

Model Intention: LFM2 1.2B is particularly suited for agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations

Model URL: https://huggingface.co/flyingfishinwater/goodandsmallmodels/resolve/main/LFM2-1.2B-Q4_0.gguf?download=true

Model Info URL: https://huggingface.co/LiquidAI/LFM2-1.2B

Model License: License Info

Model Description: LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. LFM2 is a new hybrid Liquid model with multiplicative gates and short convolutions. It supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.

Developer: https://huggingface.co/LiquidAI

File Size: 696 MB

Context Length: 1024 tokens

Prompt Format:

Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Jan v1 4B

Jan-v1-4B is an advanced agentic language model with 4.02 billion parameters, built on Qwen3-4B-Thinking. It is specifically designed for agentic reasoning and problem-solving, optimized for integration with Jan App. The model achieves strong performance on chat and question-answering benchmarks with improved reasoning capabilities, making it ideal for complex task automation and intelligent agent applications.

Model Intention: Advanced agentic language model optimized for reasoning and problem-solving with 91.1% accuracy on question answering

Model URL: https://huggingface.co/flyingfishinwater/goodandsmallmodels/resolve/main/Jan-v1-4B-Q4_0.gguf?download=true

Model Info URL: https://huggingface.co/janhq/Jan-v1-4B

Model License: License Info

Model Description: Jan-v1-4B is an advanced agentic language model with 4.02 billion parameters, built on Qwen3-4B-Thinking. It is specifically designed for agentic reasoning and problem-solving, optimized for integration with Jan App. The model achieves strong performance on chat and question-answering benchmarks with improved reasoning capabilities, making it ideal for complex task automation and intelligent agent applications.

Developer: https://huggingface.co/janhq

File Size: 2400 MB

Context Length: 2048 tokens

Prompt Format:

Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Menlo Lucy 1.7B

Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing. It is built on Qwen3-1.7B and optimized to run efficiently on mobile devices, even with CPU-only configurations. It was developed by Alan Dao, Bach Vu Dinh, Alex Nguyen, and Norapat Buppodom.

Model Intention: Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing.

Model URL: https://huggingface.co/flyingfishinwater/goodandsmallmodels/resolve/main/MenloLucy-Q4K_M.gguf

Model Info URL: https://huggingface.co/Menlo/Lucy

Model License: License Info

Model Description: Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing. It is built on Qwen3-1.7B and optimized to run efficiently on mobile devices, even with CPU-only configurations. It was developed by Alan Dao, Bach Vu Dinh, Alex Nguyen, and Norapat Buppodom.

Developer: https://huggingface.co/Menlo

File Size: 1056 MB

Context Length: 2048 tokens

Prompt Format:

Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Nemotron 1.5B

OpenReasoning-Nemotron-1.5B is a large language model (LLM) which is a derivative of Qwen2.5-1.5B-Instruct. It is a reasoning model that is post-trained for reasoning about math, code and science solution generation. This model is ready for commercial/non-commercial research use.

Model Intention: It is a reasoning model that is post-trained for reasoning about math, code and science solution generation.

Model URL: https://huggingface.co/flyingfishinwater/goodandsmallmodels/resolve/main/OpenReasoning-Nemotron-1.5B-Q4K_M.gguf

Model Info URL: https://huggingface.co/nvidia/OpenReasoning-Nemotron-1.5B

Model License: License Info

Model Description: OpenReasoning-Nemotron-1.5B is a large language model (LLM) which is a derivative of Qwen2.5-1.5B-Instruct. It is a reasoning model that is post-trained for reasoning about math, code and science solution generation. This model is ready for commercial/non-commercial research use.

Developer: https://huggingface.co/nvidia

File Size: 940 MB

Context Length: 2048 tokens

Prompt Format:

Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Qwen3 1.7B Uncensored

Qwen3 1.7B Uncensored is an unrestricted variant designed for creative writing and storytelling without content limitations. It excels at generating fiction stories, horror narratives, plot development, scene continuation, and roleplaying scenarios. This model provides unfiltered responses and can produce intense or graphic content, making it suitable for users seeking unrestricted AI interactions for creative purposes.

Model Intention: An uncensored 1.7B model optimized for creative writing, fiction stories, horror narratives, and unrestricted conversational scenarios.

Model URL: https://huggingface.co/flyingfishinwater/goodandsmall_models/resolve/main/Qwen3-1.7B-Uncensored.gguf

Model Info URL: https://huggingface.co/DavidAU/Qwen3-1.7B-HORROR-Imatrix-Max-GGUF

Model License: License Info

Model Description: Qwen3 1.7B Uncensored is an unrestricted variant designed for creative writing and storytelling without content limitations. It excels at generating fiction stories, horror narratives, plot development, scene continuation, and roleplaying scenarios. This model provides unfiltered responses and can produce intense or graphic content, making it suitable for users seeking unrestricted AI interactions for creative purposes.

Developer: https://huggingface.co/DavidAU

File Size: 1110 MB

Context Length: 2048 tokens

Prompt Format:

Template Name: qwen

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


Gemma 3 270M

Gemma 3 270M is an ultra-compact transformer model with 268M parameters, designed for efficient deployment on mobile and edge devices. Part of Google's Gemma family, it offers strong performance for its size with 32K context length, multilingual support, and responsible AI design. Ideal for applications requiring fast inference with minimal computational resources while maintaining quality text generation capabilities.

Model Intention: Ultra-compact 270M parameter model optimized for resource-constrained environments with 32K context length

Model URL: https://huggingface.co/flyingfishinwater/goodandsmallmodels/resolve/main/gemma-3-270m-q4_0.gguf

Model Info URL: https://huggingface.co/google/gemma-3-270m

Model License: License Info

Model Description: Gemma 3 270M is an ultra-compact transformer model with 268M parameters, designed for efficient deployment on mobile and edge devices. Part of Google's Gemma family, it offers strong performance for its size with 32K context length, multilingual support, and responsible AI design. Ideal for applications requiring fast inference with minimal computational resources while maintaining quality text generation capabilities.

Developer: https://huggingface.co/google

File Size: 160 MB

Context Length: 2048 tokens

Prompt Format:

Template Name: gemma

Add BOS Token: Yes

Add EOS Token: No

Parse Special Tokens: Yes


πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
ERNIE-4.5-0.3B-PT-Q4_0.gguf
Recommended LFS Q4
222.33 MB Download
Jan-v1-4B-Q4_0.gguf
LFS Q4
2.21 GB Download
LFM2-1.2B-Q4_0.gguf
LFS Q4
663.52 MB Download
LFM2-VL-450M-Q4_0.gguf
LFS Q4
209.15 MB Download
Menlo_Lucy-Q4_K_M.gguf
LFS Q4
1.03 GB Download
OpenReasoning-Nemotron-1.5B-Q4_K_M.gguf
LFS Q4
940.37 MB Download
Phi-4-mini-instruct-Q4_K_M.gguf
LFS Q4
2.32 GB Download
Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf
LFS Q4
1.8 GB Download
Qwen3-0.6B-UD-Q2_K_XL.gguf
LFS Q2
287.75 MB Download
Qwen3-1.7B-Q4_K_M.gguf
LFS Q4
1.03 GB Download
Qwen3-1.7B-Uncensored.gguf
LFS
1.54 GB Download
Qwen3-4B-IQ4_NL.gguf
LFS Q4
1.85 GB Download
Qwen3-4B-Instruct-2507-Q4_0.gguf
LFS Q4
2.21 GB Download
Qwen3-4B-Instruct-2507-Q4_K_S.gguf
LFS Q4
2.22 GB Download
Qwen3-4B-Thinking-2507-Q3_K_L.gguf
LFS Q3
2.09 GB Download
Qwen3-4B-Thinking-2507-Q4_0.gguf
LFS Q4
2.21 GB Download
SmolLM3-Q4_K_M.gguf
LFS Q4
1.78 GB Download
SmolVLM2-2.2B-Instruct-Q4_K_M.gguf
LFS Q4
1.04 GB Download
baidu_ERNIE-4.5-0.3B-PT-Q4_0.gguf
LFS Q4
222.33 MB Download
gemma-3-270m-q4_0.gguf
LFS Q4
230.38 MB Download
gemma-3n-E2B-it-Q4_0.gguf
LFS Q4
2.54 GB Download
glm-edge-4b-chat.Q4_K_M.gguf
LFS Q4
2.45 GB Download
jan-nano-128k-Q4_0.gguf
LFS Q4
2.21 GB Download
mmproj-Qwen2.5-VL-3B-Instruct-Q8_0.gguf
LFS Q8
805.62 MB Download
mmproj-SmolVLM2-2.2B-Instruct-Q8_0.gguf
LFS Q8
565.07 MB Download
nomic-embed-text-v1_5-Q8_0.gguf
LFS Q8
139.38 MB Download
reader-lm-1.5b-Q4_K_M.gguf
LFS Q4
940.37 MB Download
test.gguf
LFS
3.22 MB Download