π Model Description
license: mit language:
- multilingual
- en
- ru
- whisper
- gguf
- quantized
- speech-recognition
- rust
- candle
- openai/whisper-large-v3
WHISPER-LARGE-V3 - GGUF Quantized Models
Quantized versions of openai/whisper-large-v3 in GGUF format.
Directory Structure
large-v3/
βββ whisper-large-v3-q*.gguf # Candle-compatible GGUF models (root)
βββ config.json # Model configuration for Candle
βββ tokenizer.json # Tokenizer for Candle
βββ whisper.cpp/ # whisper.cpp-compatible models
βββ whisper-large-v3-q*.gguf
Format Compatibility
- Root directory (
whisper-large-v3-*.gguf): Use with Candle (Rust ML framework)
model. prefix (e.g., model.encoder.conv1.weight)
- Requires config-large-v3.json and tokenizer-large-v3.json
- whisper.cpp/ directory: Use with whisper.cpp (C++ implementation)
model. prefix (e.g., encoder.conv1.weight)
- Compatible with whisper.cpp CLI tools
- Both directories contain .gguf files, not .bin files
Available Formats
| Format | Quality | Use Case |
|---|---|---|
| q2_k | large-v3est | Extreme compression |
| q3_k | large-v3 | Mobile devices |
| q4_0 | Good | Legacy compatibility |
| q4_k | Good | Recommended for production |
| q4_1 | Good+ | Legacy with bias |
| q5_0 | Very Good | Legacy compatibility |
| q5_k | Very Good | High quality |
| q5_1 | Very Good+ | Legacy with bias |
| q6_k | Excellent | Near-lossless |
| q8_0 | Excellent | Minimal loss, benchmarking |
Usage
With Candle (Rust)
For this model, you need to modify the example code in candle. To try whisper in candle faster and easier, it's better to use the tiny model β https://huggingface.co/oxide-lab/whisper-tiny-GGUF
Command line example:
# Run Candle Whisper with local quantized model
cargo run --example whisper --release -- \
--features symphonia \
--quantized \
--model large-v3 \
--model-id oxide-lab/whisper-large-v3-GGUF \
With whisper.cpp (C++)
# Use models from whisper.cpp/ subdirectory
./whisper.cpp/build/bin/whisper-cli \
--model models/openai/large-v3/whisper.cpp/whisper-large-v3-q4_k.gguf \
--file audio.wav
Recommended Format
For most use cases, we recommend q4_k format as it provides the best balance of:
- Size reduction (~65% large-v3er)
- Quality (minimal degradation)
- Speed (faster inference than higher quantizations)
Quantization Details
- Source Model: openai/whisper-large-v3
- Quantization Methods:
model. prefix to tensor names for Candle compatibility
- whisper.cpp GGML (whisper.cpp/ subdirectory): whisper-quantize tool
- Uses original tensor names without prefix
- Format: GGUF (GGML Universal Format) for both directories
- Total Formats: 10 quantization levels (q2k through q80)
License
Same as the original Whisper model (MIT License).
Citation
@misc{radford2022whisper,
doi = {10.48550/ARXIV.2212.04356},
url = {https://arxiv.org/abs/2212.04356},
author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
title = {Robust Speech Recognition via Large-Scale Weak Supervision},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}