πŸ“‹ Model Description


language:
  • multilingual
tags:
  • deepseek
  • vision-language
  • ocr
  • document-parse
base_model:
  • deepseek-ai/DeepSeek-OCR

DeepSeek OCR

[!NOTE]

Note currently only NexaSDK supports this model's GGUF.

Quickstart

  1. Install NexaSDK
  2. Run the model locally with one line of code:
nexa infer NexaAI/DeepSeek-OCR-GGUF
  1. Then drag your image to terminal or type into the image path

case 1 : extract text

<your-image-path> Free OCR.

case 2 : extract bounding box

<your-image-path> <|grounding|>Convert the document to markdown.

Note: If the model fails to run, install the latest Vulkan driver for Windows

Model Description

DeepSeek OCR is a high-accuracy optical character recognition model built for extracting text from complex visual inputs such as documents, screenshots, receipts, and natural scenes. It combines vision-language modeling with efficient visual encoders to achieve superior recognition of multi-language and multi-layout text while remaining lightweight enough for edge or on-device deployment.

Features

  • Multilingual OCR β€” recognizes printed and handwritten text across major global languages.
  • Document Layout Understanding β€” preserves structure such as tables, paragraphs, and titles.
  • Scene Text Recognition β€” robust against lighting, distortion, and low-quality captures.
  • Lightweight & Fast β€” optimized for CPU and GPU acceleration.
  • End-to-End Pipeline β€” supports image-to-text and structured JSON output.

Use Cases

  • Digitizing scanned documents or PDFs
  • Extracting text from mobile camera inputs or screenshots
  • Invoice and receipt parsing
  • OCR-based search and indexing systems
  • Visual question answering or document agents

Inputs and Outputs

Input:
  • Image file (JPEG, PNG, or tensor array)
  • Optional parameters for language hints or layout detection

Output:

  • Extracted text (plain text or structured format with bounding boxes)
  • Confidence scores per word or region

Integration

DeepSeek OCR can be integrated through:
  • Python API (pip install deepseek-ocr)
  • REST or gRPC endpoints for server deployment

License

This model is released under the Apache 2.0 License, allowing commercial use, modification, and redistribution with attribution.

πŸ“‚ GGUF File List

πŸ“ Filename πŸ“¦ Size ⚑ Download
DeepSeek-OCR.BF16.gguf
LFS FP16
5.47 GB Download
DeepSeek-OCR.F16.gguf
LFS FP16
5.47 GB Download
DeepSeek-OCR.Q4_0.gguf
Recommended LFS Q4
1.54 GB Download
DeepSeek-OCR.Q4_K.gguf
LFS Q4
1.92 GB Download
DeepSeek-OCR.Q5_0.gguf
LFS Q5
1.88 GB Download
DeepSeek-OCR.Q5_K.gguf
LFS Q5
2.16 GB Download
DeepSeek-OCR.Q6_K.gguf
LFS Q6
2.43 GB Download
DeepSeek-OCR.Q8_0.gguf
LFS Q8
2.9 GB Download