π Model Description
language:
- multilingual
- deepseek
- vision-language
- ocr
- document-parse
- deepseek-ai/DeepSeek-OCR
DeepSeek OCR
[!NOTE]
Note currently only NexaSDK supports this model's GGUF.
Quickstart
- Install NexaSDK
- Run the model locally with one line of code:
nexa infer NexaAI/DeepSeek-OCR-GGUF
- Then drag your image to terminal or type into the image path
case 1 : extract text
<your-image-path> Free OCR.
case 2 : extract bounding box
<your-image-path> <|grounding|>Convert the document to markdown.Note: If the model fails to run, install the latest Vulkan driver for Windows
Model Description
DeepSeek OCR is a high-accuracy optical character recognition model built for extracting text from complex visual inputs such as documents, screenshots, receipts, and natural scenes. It combines vision-language modeling with efficient visual encoders to achieve superior recognition of multi-language and multi-layout text while remaining lightweight enough for edge or on-device deployment.Features
- Multilingual OCR β recognizes printed and handwritten text across major global languages.
- Document Layout Understanding β preserves structure such as tables, paragraphs, and titles.
- Scene Text Recognition β robust against lighting, distortion, and low-quality captures.
- Lightweight & Fast β optimized for CPU and GPU acceleration.
- End-to-End Pipeline β supports image-to-text and structured JSON output.
Use Cases
- Digitizing scanned documents or PDFs
- Extracting text from mobile camera inputs or screenshots
- Invoice and receipt parsing
- OCR-based search and indexing systems
- Visual question answering or document agents
Inputs and Outputs
Input:- Image file (JPEG, PNG, or tensor array)
- Optional parameters for language hints or layout detection
Output:
- Extracted text (plain text or structured format with bounding boxes)
- Confidence scores per word or region
Integration
DeepSeek OCR can be integrated through:- Python API (
pip install deepseek-ocr) - REST or gRPC endpoints for server deployment
License
This model is released under the Apache 2.0 License, allowing commercial use, modification, and redistribution with attribution.π GGUF File List
| π Filename | π¦ Size | β‘ Download |
|---|---|---|
|
DeepSeek-OCR.BF16.gguf
LFS
FP16
|
5.47 GB | Download |
|
DeepSeek-OCR.F16.gguf
LFS
FP16
|
5.47 GB | Download |
|
DeepSeek-OCR.Q4_0.gguf
Recommended
LFS
Q4
|
1.54 GB | Download |
|
DeepSeek-OCR.Q4_K.gguf
LFS
Q4
|
1.92 GB | Download |
|
DeepSeek-OCR.Q5_0.gguf
LFS
Q5
|
1.88 GB | Download |
|
DeepSeek-OCR.Q5_K.gguf
LFS
Q5
|
2.16 GB | Download |
|
DeepSeek-OCR.Q6_K.gguf
LFS
Q6
|
2.43 GB | Download |
|
DeepSeek-OCR.Q8_0.gguf
LFS
Q8
|
2.9 GB | Download |