OCR-method grid · 2026-05-27

Same open model, same interpolation question, different OCR method to build the card.
How much of the pass-rate gap is the model, and how much is the table extractor?

For each of the 12 final interpolation candidates (Tier-2 verified on /interpolation), we regenerated the table card using 8 different OCR methods and re-ran 3 open models + 3 closed flagship APIs on each card — plus a "no OCR, direct vision" column where the flagship sees the page image directly. 860 (model × method × question) cells total, 581 correct / 35 partial / 265 incorrect.

correct = asserted value strictly between the source-cell endpoints (real interpolation); partial = in tolerance band but at/outside endpoints (often endpoint-echo); incorrect = miss.

Pass-rate heatmap — open models × OCR method

Each cell shows correct/total across the 12 interp questions for that (model, OCR method) pair. Cells are tinted by correct-rate (red → amber → green). Local open models can only read text, so they see the OCR-generated card — not the page image.

Model	Docling + EasyOCR IBM	Mistral OCR Mistral AI	GPT-4o Vision OpenAI	Gemini 2.5 Pro Vision Google	Pixtral 12B Mistral AI	Tesseract 5.5 Google	PaddleOCR / PP-Structure PaddlePaddle	Surya datalab.to	No OCR — direct vision Grok-4 + Gemini 2.5 Pro + GPT-4o
Apertus 8B Swiss AI	42% 5/12 correct	50% 6/12 correct	58% 7/12 correct	25% 1/4 correct	42% 5/12 correct	42% 5/12 correct	25% 3/12 correct	33% 4/12 correct	no data
ClimateGPT 13B climate-domain Llama-2	42% 5/12 correct +2 band-only	25% 3/12 correct +2 band-only	42% 5/12 correct +2 band-only	25% 1/4 correct	42% 5/12 correct +1 band-only	17% 2/12 correct	17% 2/12 correct +1 band-only	25% 3/12 correct +1 band-only	no data
Qwen 2.5 7B Alibaba	75% 9/12 correct	67% 8/12 correct	42% 5/12 correct	0% 0/4 correct	67% 8/12 correct	67% 8/12 correct	25% 3/12 correct	33% 4/12 correct	no data
Qwen 2.5 VL 7B Alibaba (vision)	no data	no data	no data	no data	no data	no data	no data	no data	75% 9/12 correct

What to look for. Variance going across a row tells you how much the OCR method matters for that particular model. Variance going down a column tells you how the same OCR method serves different open models. If a column is roughly green across all three models, that OCR method is producing usable cards for the open-tier. If a row is consistently green, that open model is robust to OCR noise. Qwen 2.5 VL 7B is the first open multimodal model in the grid — it only fills the "direct vision" column (it sees the raw page image instead of a text card).

Pass-rate heatmap — flagship API models × OCR method (text) + direct vision

Same 12 questions, but answered by closed-tier flagship APIs. The first 5 columns are the same OCR-text setup as above — flagship reads the per-OCR card, no image. The rightmost column No OCR — direct vision skips the OCR step entirely: the flagship sees the raw page image plus the question and answers in one shot.

Model	Docling + EasyOCR IBM	Mistral OCR Mistral AI	GPT-4o Vision OpenAI	Gemini 2.5 Pro Vision Google	Pixtral 12B Mistral AI	No OCR — direct vision Grok-4 + Gemini 2.5 Pro + GPT-4o
Grok-4 xAI	100% 12/12 correct	92% 11/12 correct	75% 9/12 correct	0% 0/4 correct	83% 10/12 correct	83% 10/12 correct
Gemini 2.5 Pro Google	no data	no data	no data	no data	no data	no data
GPT-4o OpenAI	67% 8/12 correct	67% 8/12 correct	58% 7/12 correct	0% 0/4 correct	67% 8/12 correct	75% 9/12 correct +1 band-only

The question this answers. If the "direct vision" column is the greenest, the OCR pipeline is adding more noise than signal even for SOTA models — they'd be better off just looking at the page. If the best OCR column beats direct vision, OCR is structurally useful even for the top of the stack (and the cheapest OCR that hits that ceiling wins on cost).

Card-variant ablation — Grok-4 (flagship) vs Qwen 2.5 7B (best open)

Both Grok-4 and Qwen 2.5 7B were tested against every Erschließung card variant + several floor cases (raw EasyOCR with no Docling layout, plain PDF text layer, direct page vision). Same 12 interp questions for each cell. Sorted by Qwen 2.5 7B correct count — the interesting axis is which formats lift the open-tier model above the prior 9/12 ceiling.

Card variant	Qwen 2.5 7B	Grok-4	$/page	s/page	What it tests
Docling + EasyOCR IBM (Docling) + JaidedAI (EasyOCR)	9/12	12/12	$0	16	Erschließung pipeline default. Force-full-page OCR mode (–force-reocr).
compact-2k Erschliessung	9/12 +2	12/12	$0	16	~2 KB card variant — caption + table + 1 paragraph.
table-only Erschliessung	9/12 +1	12/12	$0	16	Table + caption only; no surrounding metadata.
csv-plus-scope Erschliessung	10/12	12/12	$0	16	csv-only + geographic+temporal scope.
csv-plus-paragraph Erschliessung	10/12	12/12	$0	16	csv-only + one nearby paragraph (≤400 chars).
csv-plus-all-context Erschliessung	8/12	12/12	$0	16	csv-only + headings + scope + paragraph.
csv-demerged Erschliessung	7/12 +2	12/12	$0	16	csv-only with merged rows split deterministically.
csv-normalized Erschliessung	7/12 +2	12/12	$0	16	csv-only with visual-confusable OCR normalization.
csv-normalized-rules Erschliessung	9/12	12/12	$0	16	csv-normalized + explicit normalization rules in card.
table-normalized Erschliessung	10/12	12/12	$0	16	table-only + visual-confusable normalization.
json-only Erschliessung	8/12	12/12	$0	16	Table as JSON instead of CSV.
stitched Erschliessung	7/12 +1	12/12	$0	16	Multi-page tables stitched into one logical view.
micro-1k Erschliessung	10/12 +1	11/12 +1	$0	16	~1.2 KB card variant — caption + table only.
compact-4k Erschliessung	10/12	11/12	$0	16	~4 KB card variant.
csv-plus-headings Erschliessung	10/12	11/12	$0	16	csv-only + section headings.
labeled Erschliessung	10/12	11/12 +1	$0	16	Explicit per-section provenance labels.
prose Erschliessung	7/12 +1	11/12 +1	$0	16	Table rendered as English prose instead of structured CSV.
Mistral OCR Mistral AI	8/12	11/12	$0.0030	1.3	Dedicated OCR endpoint (mistral-ocr-latest). Returns per-page markdown; very fast.
pdf-text-no-ocr pdftotext (no OCR)	9/12	11/12	$0	0.1	Embedded PDF text layer via pdftotext. On scanned V27/V35: mostly garbled control characters. On born-digital NOAA: clean text.
No OCR — direct vision Grok-4 + Gemini 2.5 Pro + GPT-4o (avg)	no data	10/12	$0.0098	15	Flagship vision LLM sees the raw page image + the question — no OCR step. Only meaningful for flagship rows (open models can't see images).
GPT-4o Vision OpenAI	5/12	9/12	$0.0075	9	Vision endpoint of GPT-4o. ~$0.005-$0.01/page depending on image size at 300 DPI.
Pixtral 12B Mistral AI	8/12	10/12	$0.0010	25	Open weights (Apache 2.0) but tested here via Mistral API. ~$0.0005-0.001/page at ~1.1K image tokens.
easyocr-raw JaidedAI (no Docling)	6/12	7/12	$0	3	EasyOCR on the page image, no Docling layout/table reconstruction. Tests the OCR-only floor.
no-frontmatter Erschliessung	5/12 +1	7/12 +3	$0	16	Strip the YAML frontmatter. Hurts Grok-4 noticeably.

What this shows. Grok-4 is roughly format-insensitive — ~13 variants all score 12/12. Qwen 2.5 7B is much more format-sensitive: its peak (10/12 onmicro-1k, compact-4k, csv-plus-*, labeled,table-normalized) is one above its score on the project's defaultdocling-easyocr (9/12). Compact-with-context formats help the open model materially. Floor cases — no-frontmatter, easyocr-raw, gpt4o-vision — hurt both, but hurt the open model more.

OCR computation cost — per-page

Two axes: USD per page (commercial API list price; $0 for local tools) and seconds per page (wall-clock on the project's M-series Mac). The two are not interchangeable — a free local tool can be the slowest of the bunch, and the cheapest paid API can be the fastest.

Full V27 document874 pages (full V27)

olmOCR (Qwen2-VL 7B)

6992 min

Marker

1165 min

Gemini 2.5 Pro Vision

$10.49

364 min

Pixtral 12B

$0.87

364 min

Docling + EasyOCR

233 min

micro-1k

233 min

compact-2k

233 min

compact-4k

233 min

table-only

233 min

labeled

233 min

csv-plus-headings

233 min

csv-plus-scope

233 min

csv-plus-paragraph

233 min

csv-plus-all-context

233 min

csv-demerged

233 min

csv-normalized

233 min

csv-normalized-rules

233 min

table-normalized

233 min

json-only

233 min

no-frontmatter

233 min

prose

233 min

stitched

233 min

No OCR — direct vision

$8.57

219 min

Surya

219 min

GPT-4o Vision

$6.55

131 min

PaddleOCR / PP-Structure

117 min

Docling + Tesseract

44 min

easyocr-raw

44 min

Mistral OCR

$2.62

19 min

Tesseract 5.5

15 min

pdf-text-no-ocr

1 min

USD costwall-clock minutes

V27 tested-table pages only≈87 pages (10% of 874)

olmOCR (Qwen2-VL 7B)

699 min

Marker

117 min

Gemini 2.5 Pro Vision

$1.05

36 min

Pixtral 12B

$0.09

36 min

Docling + EasyOCR

23 min

micro-1k

23 min

compact-2k

23 min

compact-4k

23 min

table-only

23 min

labeled

23 min

csv-plus-headings

23 min

csv-plus-scope

23 min

csv-plus-paragraph

23 min

csv-plus-all-context

23 min

csv-demerged

23 min

csv-normalized

23 min

csv-normalized-rules

23 min

table-normalized

23 min

json-only

23 min

no-frontmatter

23 min

prose

23 min

stitched

23 min

No OCR — direct vision

$0.86

22 min

Surya

22 min

GPT-4o Vision

$0.66

13 min

PaddleOCR / PP-Structure

12 min

Docling + Tesseract

4 min

easyocr-raw

4 min

Mistral OCR

$0.26

2 min

Tesseract 5.5

1 min

pdf-text-no-ocr

0 min

USD costwall-clock minutes

License and access

Method	Vendor	Family	Access	$/page	s/page	Note
Docling + EasyOCR	IBM (Docling) + JaidedAI (EasyOCR)	pdf-parser	Open-source · runnable locally for free	$0	16	Erschließung pipeline default. Force-full-page OCR mode (–force-reocr).
Docling + Tesseract	IBM (Docling) + Google (Tesseract)	pdf-parser	Open-source · runnable locally for free	$0	3	Faster than EasyOCR variant on this hardware; comparable accuracy on V27 scans.
Tesseract 5.5	Google (now community-maintained)	image-ocr	Open-source · runnable locally for free	$0	1	Plain text + hOCR output. No table-structure recovery; needs a layout layer on top.
PaddleOCR / PP-Structure	PaddlePaddle (Baidu)	image-ocr	Open-source · runnable locally for free	$0	8	Includes table structure recognition (SLANet). Strong on born-digital tables.
Surya	datalab.to	image-ocr	Open-source · runnable locally for free	$0	15	OCR + layout + reading order + table recognition. Modern transformer-based.
Marker	datalab.to	pdf-parser	Open-source · runnable locally for free	$0	80	PDF → markdown with reading-order tables. Slower but high-fidelity.
olmOCR (Qwen2-VL 7B)	Allen Institute for AI	multimodal-llm	Open-source · runnable locally for free	$0	480	7B vision LLM, GPU-recommended. ~5-10 min/page on CPU/MPS. Not yet tested in grid (transformers 5.x compatibility).
Mistral OCR	Mistral AI	pdf-parser	Commercial · paid API	$0.0030	1.3	Dedicated OCR endpoint (mistral-ocr-latest). Returns per-page markdown; very fast.
GPT-4o Vision	OpenAI	vision-llm	Commercial · paid API	$0.0075	9	Vision endpoint of GPT-4o. ~$0.005-$0.01/page depending on image size at 300 DPI.
Gemini 2.5 Pro Vision	Google	vision-llm	Commercial · paid API	$0.0120	25	Free tier exists but quota-limited; paid tier ~$0.005-0.025/page. Lowest rate limits of the 3 hosted vision LLMs.
Pixtral 12B	Mistral AI	vision-llm	Commercial · paid API	$0.0010	25	Open weights (Apache 2.0) but tested here via Mistral API. ~$0.0005-0.001/page at ~1.1K image tokens.
No OCR — direct vision	Grok-4 + Gemini 2.5 Pro + GPT-4o (avg)	vision-llm	Commercial · paid API	$0.0098	15	Flagship vision LLM sees the raw page image + the question — no OCR step. Only meaningful for flagship rows (open models can't see images).
pdf-text-no-ocr	pdftotext (no OCR)	pdf-parser	Open-source · runnable locally for free	$0	0.1	Embedded PDF text layer via pdftotext. On scanned V27/V35: mostly garbled control characters. On born-digital NOAA: clean text.
easyocr-raw	JaidedAI (no Docling)	image-ocr	Open-source · runnable locally for free	$0	3	EasyOCR on the page image, no Docling layout/table reconstruction. Tests the OCR-only floor.
micro-1k	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	~1.2 KB card variant — caption + table only.
compact-2k	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	~2 KB card variant — caption + table + 1 paragraph.
compact-4k	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	~4 KB card variant.
table-only	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	Table + caption only; no surrounding metadata.
labeled	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	Explicit per-section provenance labels.
csv-plus-headings	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	csv-only + section headings.
csv-plus-scope	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	csv-only + geographic+temporal scope.
csv-plus-paragraph	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	csv-only + one nearby paragraph (≤400 chars).
csv-plus-all-context	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	csv-only + headings + scope + paragraph.
csv-demerged	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	csv-only with merged rows split deterministically.
csv-normalized	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	csv-only with visual-confusable OCR normalization.
csv-normalized-rules	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	csv-normalized + explicit normalization rules in card.
table-normalized	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	table-only + visual-confusable normalization.
json-only	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	Table as JSON instead of CSV.
no-frontmatter	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	Strip the YAML frontmatter. Hurts Grok-4 noticeably.
prose	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	Table rendered as English prose instead of structured CSV.
stitched	Erschliessung	pdf-parser	Open-source · runnable locally for free	$0	16	Multi-page tables stitched into one logical view.

Same open model, same interpolation question, different OCR method to build the card.How much of the pass-rate gap is the model, and how much is the table extractor?

Pass-rate heatmap — open models × OCR method

Pass-rate heatmap — flagship API models × OCR method (text) + direct vision

Card-variant ablation — Grok-4 (flagship) vs Qwen 2.5 7B (best open)

OCR computation cost — per-page

License and access

Same open model, same interpolation question, different OCR method to build the card.
How much of the pass-rate gap is the model, and how much is the table extractor?