GovTools
OCR-method grid · 2026-05-27

Same open model, same interpolation question, different OCR method to build the card.
How much of the pass-rate gap is the model, and how much is the table extractor?

For each of the 12 final interpolation candidates (Tier-2 verified on /interpolation), we regenerated the table card using 8 different OCR methods and re-ran 3 open models + 3 closed flagship APIs on each card — plus a "no OCR, direct vision" column where the flagship sees the page image directly. 860 (model × method × question) cells total, 581 correct / 35 partial / 265 incorrect.

correct = asserted value strictly between the source-cell endpoints (real interpolation); partial = in tolerance band but at/outside endpoints (often endpoint-echo); incorrect = miss.

Pass-rate heatmap — open models × OCR method

Each cell shows correct/total across the 12 interp questions for that (model, OCR method) pair. Cells are tinted by correct-rate (red → amber → green). Local open models can only read text, so they see the OCR-generated card — not the page image.

Model
Docling + EasyOCR
IBM
Mistral OCR
Mistral AI
GPT-4o Vision
OpenAI
Gemini 2.5 Pro Vision
Google
Pixtral 12B
Mistral AI
Tesseract 5.5
Google
PaddleOCR / PP-Structure
PaddlePaddle
Surya
datalab.to
No OCR — direct vision
Grok-4 + Gemini 2.5 Pro + GPT-4o
Apertus 8B
Swiss AI
42%
5/12 correct
50%
6/12 correct
58%
7/12 correct
25%
1/4 correct
42%
5/12 correct
42%
5/12 correct
25%
3/12 correct
33%
4/12 correct
no data
ClimateGPT 13B
climate-domain Llama-2
42%
5/12 correct
+2 band-only
25%
3/12 correct
+2 band-only
42%
5/12 correct
+2 band-only
25%
1/4 correct
42%
5/12 correct
+1 band-only
17%
2/12 correct
17%
2/12 correct
+1 band-only
25%
3/12 correct
+1 band-only
no data
Qwen 2.5 7B
Alibaba
75%
9/12 correct
67%
8/12 correct
42%
5/12 correct
0%
0/4 correct
67%
8/12 correct
67%
8/12 correct
25%
3/12 correct
33%
4/12 correct
no data
Qwen 2.5 VL 7B
Alibaba (vision)
no data
no data
no data
no data
no data
no data
no data
no data
75%
9/12 correct

What to look for. Variance going across a row tells you how much the OCR method matters for that particular model. Variance going down a column tells you how the same OCR method serves different open models. If a column is roughly green across all three models, that OCR method is producing usable cards for the open-tier. If a row is consistently green, that open model is robust to OCR noise. Qwen 2.5 VL 7B is the first open multimodal model in the grid — it only fills the "direct vision" column (it sees the raw page image instead of a text card).

Pass-rate heatmap — flagship API models × OCR method (text) + direct vision

Same 12 questions, but answered by closed-tier flagship APIs. The first 5 columns are the same OCR-text setup as above — flagship reads the per-OCR card, no image. The rightmost column No OCR — direct vision skips the OCR step entirely: the flagship sees the raw page image plus the question and answers in one shot.

Model
Docling + EasyOCR
IBM
Mistral OCR
Mistral AI
GPT-4o Vision
OpenAI
Gemini 2.5 Pro Vision
Google
Pixtral 12B
Mistral AI
No OCR — direct vision
Grok-4 + Gemini 2.5 Pro + GPT-4o
Grok-4
xAI
100%
12/12 correct
92%
11/12 correct
75%
9/12 correct
0%
0/4 correct
83%
10/12 correct
83%
10/12 correct
Gemini 2.5 Pro
Google
no data
no data
no data
no data
no data
no data
GPT-4o
OpenAI
67%
8/12 correct
67%
8/12 correct
58%
7/12 correct
0%
0/4 correct
67%
8/12 correct
75%
9/12 correct
+1 band-only

The question this answers. If the "direct vision" column is the greenest, the OCR pipeline is adding more noise than signal even for SOTA models — they'd be better off just looking at the page. If the best OCR column beats direct vision, OCR is structurally useful even for the top of the stack (and the cheapest OCR that hits that ceiling wins on cost).

Card-variant ablation — Grok-4 (flagship) vs Qwen 2.5 7B (best open)

Both Grok-4 and Qwen 2.5 7B were tested against every Erschließung card variant + several floor cases (raw EasyOCR with no Docling layout, plain PDF text layer, direct page vision). Same 12 interp questions for each cell. Sorted by Qwen 2.5 7B correct count — the interesting axis is which formats lift the open-tier model above the prior 9/12 ceiling.

Card variantQwen 2.5 7BGrok-4$/pages/pageWhat it tests
Docling + EasyOCR
IBM (Docling) + JaidedAI (EasyOCR)
9/12
12/12
$016Erschließung pipeline default. Force-full-page OCR mode (–force-reocr).
compact-2k
Erschliessung
9/12 +2
12/12
$016~2 KB card variant — caption + table + 1 paragraph.
table-only
Erschliessung
9/12 +1
12/12
$016Table + caption only; no surrounding metadata.
csv-plus-scope
Erschliessung
10/12
12/12
$016csv-only + geographic+temporal scope.
csv-plus-paragraph
Erschliessung
10/12
12/12
$016csv-only + one nearby paragraph (≤400 chars).
csv-plus-all-context
Erschliessung
8/12
12/12
$016csv-only + headings + scope + paragraph.
csv-demerged
Erschliessung
7/12 +2
12/12
$016csv-only with merged rows split deterministically.
csv-normalized
Erschliessung
7/12 +2
12/12
$016csv-only with visual-confusable OCR normalization.
csv-normalized-rules
Erschliessung
9/12
12/12
$016csv-normalized + explicit normalization rules in card.
table-normalized
Erschliessung
10/12
12/12
$016table-only + visual-confusable normalization.
json-only
Erschliessung
8/12
12/12
$016Table as JSON instead of CSV.
stitched
Erschliessung
7/12 +1
12/12
$016Multi-page tables stitched into one logical view.
micro-1k
Erschliessung
10/12 +1
11/12 +1
$016~1.2 KB card variant — caption + table only.
compact-4k
Erschliessung
10/12
11/12
$016~4 KB card variant.
csv-plus-headings
Erschliessung
10/12
11/12
$016csv-only + section headings.
labeled
Erschliessung
10/12
11/12 +1
$016Explicit per-section provenance labels.
prose
Erschliessung
7/12 +1
11/12 +1
$016Table rendered as English prose instead of structured CSV.
Mistral OCR
Mistral AI
8/12
11/12
$0.00301.3Dedicated OCR endpoint (mistral-ocr-latest). Returns per-page markdown; very fast.
pdf-text-no-ocr
pdftotext (no OCR)
9/12
11/12
$00.1Embedded PDF text layer via pdftotext. On scanned V27/V35: mostly garbled control characters. On born-digital NOAA: clean text.
No OCR — direct vision
Grok-4 + Gemini 2.5 Pro + GPT-4o (avg)
no data
10/12
$0.009815Flagship vision LLM sees the raw page image + the question — no OCR step. Only meaningful for flagship rows (open models can't see images).
GPT-4o Vision
OpenAI
5/12
9/12
$0.00759Vision endpoint of GPT-4o. ~$0.005-$0.01/page depending on image size at 300 DPI.
Pixtral 12B
Mistral AI
8/12
10/12
$0.001025Open weights (Apache 2.0) but tested here via Mistral API. ~$0.0005-0.001/page at ~1.1K image tokens.
easyocr-raw
JaidedAI (no Docling)
6/12
7/12
$03EasyOCR on the page image, no Docling layout/table reconstruction. Tests the OCR-only floor.
no-frontmatter
Erschliessung
5/12 +1
7/12 +3
$016Strip the YAML frontmatter. Hurts Grok-4 noticeably.

What this shows. Grok-4 is roughly format-insensitive — ~13 variants all score 12/12. Qwen 2.5 7B is much more format-sensitive: its peak (10/12 onmicro-1k, compact-4k, csv-plus-*, labeled,table-normalized) is one above its score on the project's defaultdocling-easyocr (9/12). Compact-with-context formats help the open model materially. Floor cases — no-frontmatter, easyocr-raw, gpt4o-vision — hurt both, but hurt the open model more.

OCR computation cost — per-page

Two axes: USD per page (commercial API list price; $0 for local tools) and seconds per page (wall-clock on the project's M-series Mac). The two are not interchangeable — a free local tool can be the slowest of the bunch, and the cheapest paid API can be the fastest.

Full V27 document874 pages (full V27)
olmOCR (Qwen2-VL 7B)
$0
6992 min
Marker
$0
1165 min
Gemini 2.5 Pro Vision
$10.49
364 min
Pixtral 12B
$0.87
364 min
Docling + EasyOCR
$0
233 min
micro-1k
$0
233 min
compact-2k
$0
233 min
compact-4k
$0
233 min
table-only
$0
233 min
labeled
$0
233 min
csv-plus-headings
$0
233 min
csv-plus-scope
$0
233 min
csv-plus-paragraph
$0
233 min
csv-plus-all-context
$0
233 min
csv-demerged
$0
233 min
csv-normalized
$0
233 min
csv-normalized-rules
$0
233 min
table-normalized
$0
233 min
json-only
$0
233 min
no-frontmatter
$0
233 min
prose
$0
233 min
stitched
$0
233 min
No OCR — direct vision
$8.57
219 min
Surya
$0
219 min
GPT-4o Vision
$6.55
131 min
PaddleOCR / PP-Structure
$0
117 min
Docling + Tesseract
$0
44 min
easyocr-raw
$0
44 min
Mistral OCR
$2.62
19 min
Tesseract 5.5
$0
15 min
pdf-text-no-ocr
$0
1 min
USD costwall-clock minutes
V27 tested-table pages only≈87 pages (10% of 874)
olmOCR (Qwen2-VL 7B)
$0
699 min
Marker
$0
117 min
Gemini 2.5 Pro Vision
$1.05
36 min
Pixtral 12B
$0.09
36 min
Docling + EasyOCR
$0
23 min
micro-1k
$0
23 min
compact-2k
$0
23 min
compact-4k
$0
23 min
table-only
$0
23 min
labeled
$0
23 min
csv-plus-headings
$0
23 min
csv-plus-scope
$0
23 min
csv-plus-paragraph
$0
23 min
csv-plus-all-context
$0
23 min
csv-demerged
$0
23 min
csv-normalized
$0
23 min
csv-normalized-rules
$0
23 min
table-normalized
$0
23 min
json-only
$0
23 min
no-frontmatter
$0
23 min
prose
$0
23 min
stitched
$0
23 min
No OCR — direct vision
$0.86
22 min
Surya
$0
22 min
GPT-4o Vision
$0.66
13 min
PaddleOCR / PP-Structure
$0
12 min
Docling + Tesseract
$0
4 min
easyocr-raw
$0
4 min
Mistral OCR
$0.26
2 min
Tesseract 5.5
$0
1 min
pdf-text-no-ocr
$0
0 min
USD costwall-clock minutes

License and access

MethodVendorFamilyAccess$/pages/pageNote
Docling + EasyOCRIBM (Docling) + JaidedAI (EasyOCR)pdf-parserOpen-source · runnable locally for free$016Erschließung pipeline default. Force-full-page OCR mode (–force-reocr).
Docling + TesseractIBM (Docling) + Google (Tesseract)pdf-parserOpen-source · runnable locally for free$03Faster than EasyOCR variant on this hardware; comparable accuracy on V27 scans.
Tesseract 5.5Google (now community-maintained)image-ocrOpen-source · runnable locally for free$01Plain text + hOCR output. No table-structure recovery; needs a layout layer on top.
PaddleOCR / PP-StructurePaddlePaddle (Baidu)image-ocrOpen-source · runnable locally for free$08Includes table structure recognition (SLANet). Strong on born-digital tables.
Suryadatalab.toimage-ocrOpen-source · runnable locally for free$015OCR + layout + reading order + table recognition. Modern transformer-based.
Markerdatalab.topdf-parserOpen-source · runnable locally for free$080PDF → markdown with reading-order tables. Slower but high-fidelity.
olmOCR (Qwen2-VL 7B)Allen Institute for AImultimodal-llmOpen-source · runnable locally for free$04807B vision LLM, GPU-recommended. ~5-10 min/page on CPU/MPS. Not yet tested in grid (transformers 5.x compatibility).
Mistral OCRMistral AIpdf-parserCommercial · paid API$0.00301.3Dedicated OCR endpoint (mistral-ocr-latest). Returns per-page markdown; very fast.
GPT-4o VisionOpenAIvision-llmCommercial · paid API$0.00759Vision endpoint of GPT-4o. ~$0.005-$0.01/page depending on image size at 300 DPI.
Gemini 2.5 Pro VisionGooglevision-llmCommercial · paid API$0.012025Free tier exists but quota-limited; paid tier ~$0.005-0.025/page. Lowest rate limits of the 3 hosted vision LLMs.
Pixtral 12BMistral AIvision-llmCommercial · paid API$0.001025Open weights (Apache 2.0) but tested here via Mistral API. ~$0.0005-0.001/page at ~1.1K image tokens.
No OCR — direct visionGrok-4 + Gemini 2.5 Pro + GPT-4o (avg)vision-llmCommercial · paid API$0.009815Flagship vision LLM sees the raw page image + the question — no OCR step. Only meaningful for flagship rows (open models can't see images).
pdf-text-no-ocrpdftotext (no OCR)pdf-parserOpen-source · runnable locally for free$00.1Embedded PDF text layer via pdftotext. On scanned V27/V35: mostly garbled control characters. On born-digital NOAA: clean text.
easyocr-rawJaidedAI (no Docling)image-ocrOpen-source · runnable locally for free$03EasyOCR on the page image, no Docling layout/table reconstruction. Tests the OCR-only floor.
micro-1kErschliessungpdf-parserOpen-source · runnable locally for free$016~1.2 KB card variant — caption + table only.
compact-2kErschliessungpdf-parserOpen-source · runnable locally for free$016~2 KB card variant — caption + table + 1 paragraph.
compact-4kErschliessungpdf-parserOpen-source · runnable locally for free$016~4 KB card variant.
table-onlyErschliessungpdf-parserOpen-source · runnable locally for free$016Table + caption only; no surrounding metadata.
labeledErschliessungpdf-parserOpen-source · runnable locally for free$016Explicit per-section provenance labels.
csv-plus-headingsErschliessungpdf-parserOpen-source · runnable locally for free$016csv-only + section headings.
csv-plus-scopeErschliessungpdf-parserOpen-source · runnable locally for free$016csv-only + geographic+temporal scope.
csv-plus-paragraphErschliessungpdf-parserOpen-source · runnable locally for free$016csv-only + one nearby paragraph (≤400 chars).
csv-plus-all-contextErschliessungpdf-parserOpen-source · runnable locally for free$016csv-only + headings + scope + paragraph.
csv-demergedErschliessungpdf-parserOpen-source · runnable locally for free$016csv-only with merged rows split deterministically.
csv-normalizedErschliessungpdf-parserOpen-source · runnable locally for free$016csv-only with visual-confusable OCR normalization.
csv-normalized-rulesErschliessungpdf-parserOpen-source · runnable locally for free$016csv-normalized + explicit normalization rules in card.
table-normalizedErschliessungpdf-parserOpen-source · runnable locally for free$016table-only + visual-confusable normalization.
json-onlyErschliessungpdf-parserOpen-source · runnable locally for free$016Table as JSON instead of CSV.
no-frontmatterErschliessungpdf-parserOpen-source · runnable locally for free$016Strip the YAML frontmatter. Hurts Grok-4 noticeably.
proseErschliessungpdf-parserOpen-source · runnable locally for free$016Table rendered as English prose instead of structured CSV.
stitchedErschliessungpdf-parserOpen-source · runnable locally for free$016Multi-page tables stitched into one logical view.