Model Description


library_name: transformers license: other license_name: lfm1.0 license_link: LICENSE language:
  • en
  • ar
  • zh
  • fr
  • de
  • ja
  • ko
  • es
  • pt
pipeline_tag: text-generation tags:
  • liquid
  • lfm2
  • edge
  • moe
  • llama.cpp
  • gguf
base_model:
  • LiquidAI/LFM2-24B-A2B



src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/2b08LKpev0DNEk6DlnWkY.png"
alt="Liquid AI"
style="width: 100%; max-width: 100%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
/>


LFM2-24B-A2B-GGUF

LFM2 is a family of hybrid models designed for on-device deployment. LFM2-24B-A2B is the largest model in the family, scaling the architecture to 24 billion parameters while keeping inference efficient.

  • Best-in-class efficiency: A 24B MoE model with only 2B active parameters per token, fitting in 32 GB of RAM for deployment on consumer laptops and desktops.
  • Fast edge inference: 112 tok/s decode on AMD CPU, 293 tok/s on H100. Fits in 32B GB of RAM with day-one support llama.cpp, vLLM, and SGLang.
  • Predictable scaling: Quality improves log-linearly from 350M to 24B total parameters, confirming the LFM2 hybrid architecture scales reliably across nearly two orders of magnitude.

!image

Find more information about LFM2-24B-A2B in our blog post.

How to run LFM2

Example usage with llama.cpp:

llama-cli -hf LiquidAI/LFM2-24B-A2B-GGUF

GGUF File List

📁 Filename 📦 Size ⚡ Download
LFM2-24B-A2B-BF16.gguf
LFS FP16
44.42 GB Download
LFM2-24B-A2B-F16.gguf
LFS FP16
44.42 GB Download
LFM2-24B-A2B-Q4_0.gguf
Recommended LFS Q4
12.54 GB Download
LFM2-24B-A2B-Q4_K_M.gguf
LFS Q4
13.43 GB Download
LFM2-24B-A2B-Q5_K_M.gguf
LFS Q5
15.76 GB Download
LFM2-24B-A2B-Q6_K.gguf
LFS Q6
18.23 GB Download
LFM2-24B-A2B-Q8_0.gguf
LFS Q8
23.61 GB Download