Model Description
library_name: transformers license: other license_name: lfm1.0 license_link: LICENSE language:
- en
- ar
- zh
- fr
- de
- ja
- ko
- es
- pt
- liquid
- lfm2
- edge
- moe
- llama.cpp
- gguf
- LiquidAI/LFM2-24B-A2B
alt="Liquid AI"
style="width: 100%; max-width: 100%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
/>
LFM2-24B-A2B-GGUF
LFM2 is a family of hybrid models designed for on-device deployment. LFM2-24B-A2B is the largest model in the family, scaling the architecture to 24 billion parameters while keeping inference efficient.
- Best-in-class efficiency: A 24B MoE model with only 2B active parameters per token, fitting in 32 GB of RAM for deployment on consumer laptops and desktops.
- Fast edge inference: 112 tok/s decode on AMD CPU, 293 tok/s on H100. Fits in 32B GB of RAM with day-one support llama.cpp, vLLM, and SGLang.
- Predictable scaling: Quality improves log-linearly from 350M to 24B total parameters, confirming the LFM2 hybrid architecture scales reliably across nearly two orders of magnitude.
Find more information about LFM2-24B-A2B in our blog post.
How to run LFM2
Example usage with llama.cpp:
llama-cli -hf LiquidAI/LFM2-24B-A2B-GGUF
GGUF File List
| 📁 Filename | 📦 Size | ⚡ Download |
|---|---|---|
|
LFM2-24B-A2B-BF16.gguf
LFS
FP16
|
44.42 GB | Download |
|
LFM2-24B-A2B-F16.gguf
LFS
FP16
|
44.42 GB | Download |
|
LFM2-24B-A2B-Q4_0.gguf
Recommended
LFS
Q4
|
12.54 GB | Download |
|
LFM2-24B-A2B-Q4_K_M.gguf
LFS
Q4
|
13.43 GB | Download |
|
LFM2-24B-A2B-Q5_K_M.gguf
LFS
Q5
|
15.76 GB | Download |
|
LFM2-24B-A2B-Q6_K.gguf
LFS
Q6
|
18.23 GB | Download |
|
LFM2-24B-A2B-Q8_0.gguf
LFS
Q8
|
23.61 GB | Download |