Model Description


tags:
  • text-generation-inference
  • transformers
  • unsloth
  • qwen3_vl
  • trl
  • sft
  • chemistry
  • code
  • climate
  • art
  • biology
  • finance
  • legal
  • music
  • medical
  • agent
license: apache-2.0 language:
  • en
  • ab
  • aa
  • ae
  • af
  • ak
  • am
  • an
  • ar
  • as
  • av
  • ay
  • az
  • ba
  • be
  • bg
  • bh
  • bi
  • bm
  • bn
  • bo
  • br
  • bs
  • ca
  • ce
  • ch
  • co
  • cr
  • cs
  • cu
  • cv
  • cy
  • da
  • de
  • dv
  • dz
  • ee
  • el
  • eo
  • es
  • et
  • eu
  • fa
  • ff
  • fi
  • fj
  • fo
  • fr
  • fy
  • ga
  • gd
  • gl
  • gn
  • gv
  • ha
  • he
  • hi
  • ho
  • gu
  • hr
  • ht
  • hu
  • hz
  • hy
  • id
  • ia
  • ig
  • ie
  • ik
  • ii
  • is
  • io
  • iu
  • it
  • jv
  • ja
  • kg
  • ka
  • kj
  • ki
  • kl
  • kk
  • kn
  • km
  • kr
  • ko
  • ku
  • ks
  • kw
  • kv
  • la
  • ky
  • lg
  • lb
  • ln
  • li
  • lt
  • lo
  • lv
  • lu
  • mg
  • mi
  • mh
  • ml
  • mk
  • mr
  • mn
  • mt
  • ms
  • na
  • my
  • nd
  • nb
  • ng
  • nl
  • ne
  • 'no'
  • nn
  • nv
  • nr
  • oc
  • oj
  • om
  • ny
  • os
  • or
  • pa
  • pi
  • pl
  • ps
  • pt
  • rm
  • rn
  • qu
  • ro
  • ru
  • sn
  • rw
  • so
  • sa
  • sc
  • sd
pipeline_tag: image-text-to-text library_name: transformers

🖼️ Next OCR 8B

Compact OCR AI — Accurate, Fast, Multilingual, Math-Optimized

License: MIT</a>
![Language: Multilingual]()
HuggingFace</a>


📖 Overview

Next OCR 8B is an 8-billion parameter model optimized for optical character recognition (OCR) tasks with mathematical and tabular content understanding.

Supports multilingual OCR (Turkish, English, German, Spanish, French, Chinese, Japanese, Korean, Russian...) with high accuracy, including structured documents like tables, forms, and formulas.


⚡ Highlights

  • 🖼️ Accurate text extraction, including math and tables
  • 🌍 Multilingual support (30+ languages)
  • ⚡ Lightweight and efficient
  • 💬 Instruction-tuned for document understanding and analysis

📊 Benchmark & Comparison

!image


ModelOCR-Bench Accuracy (%)Multilingual Accuracy (%)Layout / Table Understanding (%)
Next OCR99.096.895.3
PaddleOCR95.293.995.3
Deepseek OCR90.687.486.1
Tesseract92.088.472.0
EasyOCR90.484.778.9
Google Cloud Vision / DocAI98.795.593.6
Amazon Textract94.786.286.1
Azure Document Intelligence95.193.691.4

| Model | Handwriting (%) | Scene Text (%) | Complex Tables (%) |
| --------------------------- | --------------- | -------------- | ------------------ |
| Next OCR | 92 | 96 | 91 |
| PaddleOCR | 88 | 92 | 90 |
| Deepseek OCR | 80 | 85 | 83 |
| Tesseract | 75 | 88 | 70 |
| EasyOCR | 78 | 86 | 75 |
| Google Cloud Vision / DocAI | 90 | 95 | 92 |
| Amazon Textract | 85 | 90 | 88 |
| Azure Document Intelligence | 87 | 91 | 89 |


🚀 Installation & Usage

from transformers import AutoTokenizer, AutoModelForVision2Seq
import torch

model_id = "Lamapi/next-ocr"

tokenizer = AutoTokenizer.frompretrained(modelid)
model = AutoModelForVision2Seq.frompretrained(modelid, torch_dtype=torch.float16)

img = Image.open("image.jpg")

ATTENTION: The content list must include both an image and text.

messages = [ {"role": "system", "content": "You are Next-OCR, an helpful AI assistant trained by Lamapi."}, { "role": "user", "content": [ {"type": "image", "image": img}, {"type": "text", "text": "Read the text in this image and summarize it."} ] } ]

Apply the chat template correctly

prompt = processor.applychattemplate(messages, tokenize=False, addgenerationprompt=True) inputs = processor(text=prompt, images=[img], return_tensors="pt").to(model.device)

with torch.no_grad():
generated = model.generate(inputs, maxnewtokens=256)

print(processor.decode(generated[0], skipspecialtokens=True))


🧩 Key Features

FeatureDescription
🖼️ High-Accuracy OCRExtracts text from images, documents, and screenshots reliably.
🇹🇷 Multilingual SupportWorks with 30+ languages including Turkish.
⚡ Lightweight & EfficientOptimized for resource-constrained environments.
📄 Layout & Math AwarenessHandles tables, forms, and mathematical formulas.
🏢 Reliable OutputsSuitable for enterprise document workflows.

📐 Model Specifications

SpecificationDetails
Base ModelQwen 3
Parameters8 Billion
ArchitectureVision + Transformer (OCR LLM)
ModalitiesImage-to-text
Fine-TuningOCR datasets with multilingual and math/tabular content
OptimizationsQuantization-ready, FP16 support
Primary FocusText extraction, document understanding, mathematical OCR

🎯 Ideal Use Cases

  • Document digitization
  • Invoice & receipt processing
  • Multilingual OCR pipelines
  • Tables, forms, and formulas extraction
  • Enterprise document management

📄 License

MIT License — free for commercial & non-commercial use.


📞 Contact & Support


Next OCR — Compact OCR + math-capable AI, blending accuracy, speed, and multilingual document intelligence.

Follow on HuggingFace</a>

GGUF File List

📁 Filename 📦 Size ⚡ Download
mmproj-next-ocr-F16.gguf
Recommended LFS FP16
1.08 GB Download
next-ocr-q8_0.gguf
LFS Q8
8.11 GB Download