πŸ“‹ Model Description


license: mit language:
  • multilingual
  • en
  • ru
tags:
  • whisper
  • gguf
  • quantized
  • speech-recognition
  • rust
  • candle
base_model:
  • openai/whisper-large-v3
pipeline_tag: automatic-speech-recognition

WHISPER-LARGE-V3 - GGUF Quantized Models

Quantized versions of openai/whisper-large-v3 in GGUF format.

Directory Structure

large-v3/
β”œβ”€β”€ whisper-large-v3-q*.gguf       # Candle-compatible GGUF models (root)
β”œβ”€β”€ config.json                # Model configuration for Candle
β”œβ”€β”€ tokenizer.json             # Tokenizer for Candle
└── whisper.cpp/               # whisper.cpp-compatible models
    └── whisper-large-v3-q*.gguf

Format Compatibility

  • Root directory (whisper-large-v3-*.gguf): Use with Candle (Rust ML framework)
- Tensor names include model. prefix (e.g., model.encoder.conv1.weight) - Requires config-large-v3.json and tokenizer-large-v3.json
  • whisper.cpp/ directory: Use with whisper.cpp (C++ implementation)
- Tensor names without model. prefix (e.g., encoder.conv1.weight) - Compatible with whisper.cpp CLI tools - Both directories contain .gguf files, not .bin files

Available Formats

FormatQualityUse Case
q2_klarge-v3estExtreme compression
q3_klarge-v3Mobile devices
q4_0GoodLegacy compatibility
q4_kGoodRecommended for production
q4_1Good+Legacy with bias
q5_0Very GoodLegacy compatibility
q5_kVery GoodHigh quality
q5_1Very Good+Legacy with bias
q6_kExcellentNear-lossless
q8_0ExcellentMinimal loss, benchmarking

Usage

With Candle (Rust)

For this model, you need to modify the example code in candle. To try whisper in candle faster and easier, it's better to use the tiny model β†’ https://huggingface.co/oxide-lab/whisper-tiny-GGUF

Command line example:

# Run Candle Whisper with local quantized model
cargo run --example whisper --release -- \
--features symphonia \
--quantized \
--model large-v3 \
--model-id oxide-lab/whisper-large-v3-GGUF \

With whisper.cpp (C++)

# Use models from whisper.cpp/ subdirectory
./whisper.cpp/build/bin/whisper-cli \
  --model models/openai/large-v3/whisper.cpp/whisper-large-v3-q4_k.gguf \
  --file audio.wav

Recommended Format

For most use cases, we recommend q4_k format as it provides the best balance of:

  • Size reduction (~65% large-v3er)
  • Quality (minimal degradation)
  • Speed (faster inference than higher quantizations)

Quantization Details

- Candle GGUF (root directory): Python-based. Directly PyTorch β†’ GGUF - Adds model. prefix to tensor names for Candle compatibility - whisper.cpp GGML (whisper.cpp/ subdirectory): whisper-quantize tool - Uses original tensor names without prefix
  • Format: GGUF (GGML Universal Format) for both directories
  • Total Formats: 10 quantization levels (q2k through q80)

License

Same as the original Whisper model (MIT License).

Citation

@misc{radford2022whisper,
  doi = {10.48550/ARXIV.2212.04356},
  url = {https://arxiv.org/abs/2212.04356},
  author = {Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  title = {Robust Speech Recognition via Large-Scale Weak Supervision},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
whisper-large-v3-q4_0.gguf
Recommended LFS Q4
871.22 MB Download
whisper-large-v3-q4_1.gguf
LFS Q4
966.85 MB Download
whisper-large-v3-q8_0.gguf
LFS Q8
1.6 GB Download