Same open model, same interpolation question, different OCR method to build the card.
How much of the pass-rate gap is the model, and how much is the table extractor?
For each of the 12 final interpolation candidates (Tier-2 verified on /interpolation), we regenerated the table card using 8 different OCR methods and re-ran 3 open models + 3 closed flagship APIs on each card — plus a "no OCR, direct vision" column where the flagship sees the page image directly. 860 (model × method × question) cells total, 581 correct / 35 partial / 265 incorrect.
correct = asserted value strictly between the source-cell endpoints (real interpolation); partial = in tolerance band but at/outside endpoints (often endpoint-echo); incorrect = miss.
Pass-rate heatmap — open models × OCR method
Each cell shows correct/total across the 12 interp questions for that (model, OCR method) pair. Cells are tinted by correct-rate (red → amber → green). Local open models can only read text, so they see the OCR-generated card — not the page image.
| Model | Docling + EasyOCR IBM | Mistral OCR Mistral AI | GPT-4o Vision OpenAI | Gemini 2.5 Pro Vision Google | Pixtral 12B Mistral AI | Tesseract 5.5 Google | PaddleOCR / PP-Structure PaddlePaddle | Surya datalab.to | No OCR — direct vision Grok-4 + Gemini 2.5 Pro + GPT-4o |
|---|---|---|---|---|---|---|---|---|---|
Apertus 8B Swiss AI | 42% 5/12 correct | 50% 6/12 correct | 58% 7/12 correct | 25% 1/4 correct | 42% 5/12 correct | 42% 5/12 correct | 25% 3/12 correct | 33% 4/12 correct | no data |
ClimateGPT 13B climate-domain Llama-2 | 42% 5/12 correct +2 band-only | 25% 3/12 correct +2 band-only | 42% 5/12 correct +2 band-only | 25% 1/4 correct | 42% 5/12 correct +1 band-only | 17% 2/12 correct | 17% 2/12 correct +1 band-only | 25% 3/12 correct +1 band-only | no data |
Qwen 2.5 7B Alibaba | 75% 9/12 correct | 67% 8/12 correct | 42% 5/12 correct | 0% 0/4 correct | 67% 8/12 correct | 67% 8/12 correct | 25% 3/12 correct | 33% 4/12 correct | no data |
Qwen 2.5 VL 7B Alibaba (vision) | no data | no data | no data | no data | no data | no data | no data | no data | 75% 9/12 correct |
What to look for. Variance going across a row tells you how much the OCR method matters for that particular model. Variance going down a column tells you how the same OCR method serves different open models. If a column is roughly green across all three models, that OCR method is producing usable cards for the open-tier. If a row is consistently green, that open model is robust to OCR noise. Qwen 2.5 VL 7B is the first open multimodal model in the grid — it only fills the "direct vision" column (it sees the raw page image instead of a text card).
Pass-rate heatmap — flagship API models × OCR method (text) + direct vision
Same 12 questions, but answered by closed-tier flagship APIs. The first 5 columns are the same OCR-text setup as above — flagship reads the per-OCR card, no image. The rightmost column No OCR — direct vision skips the OCR step entirely: the flagship sees the raw page image plus the question and answers in one shot.
| Model | Docling + EasyOCR IBM | Mistral OCR Mistral AI | GPT-4o Vision OpenAI | Gemini 2.5 Pro Vision Google | Pixtral 12B Mistral AI | No OCR — direct vision Grok-4 + Gemini 2.5 Pro + GPT-4o |
|---|---|---|---|---|---|---|
Grok-4 xAI | 100% 12/12 correct | 92% 11/12 correct | 75% 9/12 correct | 0% 0/4 correct | 83% 10/12 correct | 83% 10/12 correct |
Gemini 2.5 Pro Google | no data | no data | no data | no data | no data | no data |
GPT-4o OpenAI | 67% 8/12 correct | 67% 8/12 correct | 58% 7/12 correct | 0% 0/4 correct | 67% 8/12 correct | 75% 9/12 correct +1 band-only |
The question this answers. If the "direct vision" column is the greenest, the OCR pipeline is adding more noise than signal even for SOTA models — they'd be better off just looking at the page. If the best OCR column beats direct vision, OCR is structurally useful even for the top of the stack (and the cheapest OCR that hits that ceiling wins on cost).
Card-variant ablation — Grok-4 (flagship) vs Qwen 2.5 7B (best open)
Both Grok-4 and Qwen 2.5 7B were tested against every Erschließung card variant + several floor cases (raw EasyOCR with no Docling layout, plain PDF text layer, direct page vision). Same 12 interp questions for each cell. Sorted by Qwen 2.5 7B correct count — the interesting axis is which formats lift the open-tier model above the prior 9/12 ceiling.
| Card variant | Qwen 2.5 7B | Grok-4 | $/page | s/page | What it tests |
|---|---|---|---|---|---|
Docling + EasyOCR IBM (Docling) + JaidedAI (EasyOCR) | 9/12 | 12/12 | $0 | 16 | Erschließung pipeline default. Force-full-page OCR mode (–force-reocr). |
compact-2k Erschliessung | 9/12 +2 | 12/12 | $0 | 16 | ~2 KB card variant — caption + table + 1 paragraph. |
table-only Erschliessung | 9/12 +1 | 12/12 | $0 | 16 | Table + caption only; no surrounding metadata. |
csv-plus-scope Erschliessung | 10/12 | 12/12 | $0 | 16 | csv-only + geographic+temporal scope. |
csv-plus-paragraph Erschliessung | 10/12 | 12/12 | $0 | 16 | csv-only + one nearby paragraph (≤400 chars). |
csv-plus-all-context Erschliessung | 8/12 | 12/12 | $0 | 16 | csv-only + headings + scope + paragraph. |
csv-demerged Erschliessung | 7/12 +2 | 12/12 | $0 | 16 | csv-only with merged rows split deterministically. |
csv-normalized Erschliessung | 7/12 +2 | 12/12 | $0 | 16 | csv-only with visual-confusable OCR normalization. |
csv-normalized-rules Erschliessung | 9/12 | 12/12 | $0 | 16 | csv-normalized + explicit normalization rules in card. |
table-normalized Erschliessung | 10/12 | 12/12 | $0 | 16 | table-only + visual-confusable normalization. |
json-only Erschliessung | 8/12 | 12/12 | $0 | 16 | Table as JSON instead of CSV. |
stitched Erschliessung | 7/12 +1 | 12/12 | $0 | 16 | Multi-page tables stitched into one logical view. |
micro-1k Erschliessung | 10/12 +1 | 11/12 +1 | $0 | 16 | ~1.2 KB card variant — caption + table only. |
compact-4k Erschliessung | 10/12 | 11/12 | $0 | 16 | ~4 KB card variant. |
csv-plus-headings Erschliessung | 10/12 | 11/12 | $0 | 16 | csv-only + section headings. |
labeled Erschliessung | 10/12 | 11/12 +1 | $0 | 16 | Explicit per-section provenance labels. |
prose Erschliessung | 7/12 +1 | 11/12 +1 | $0 | 16 | Table rendered as English prose instead of structured CSV. |
Mistral OCR Mistral AI | 8/12 | 11/12 | $0.0030 | 1.3 | Dedicated OCR endpoint (mistral-ocr-latest). Returns per-page markdown; very fast. |
pdf-text-no-ocr pdftotext (no OCR) | 9/12 | 11/12 | $0 | 0.1 | Embedded PDF text layer via pdftotext. On scanned V27/V35: mostly garbled control characters. On born-digital NOAA: clean text. |
No OCR — direct vision Grok-4 + Gemini 2.5 Pro + GPT-4o (avg) | no data | 10/12 | $0.0098 | 15 | Flagship vision LLM sees the raw page image + the question — no OCR step. Only meaningful for flagship rows (open models can't see images). |
GPT-4o Vision OpenAI | 5/12 | 9/12 | $0.0075 | 9 | Vision endpoint of GPT-4o. ~$0.005-$0.01/page depending on image size at 300 DPI. |
Pixtral 12B Mistral AI | 8/12 | 10/12 | $0.0010 | 25 | Open weights (Apache 2.0) but tested here via Mistral API. ~$0.0005-0.001/page at ~1.1K image tokens. |
easyocr-raw JaidedAI (no Docling) | 6/12 | 7/12 | $0 | 3 | EasyOCR on the page image, no Docling layout/table reconstruction. Tests the OCR-only floor. |
no-frontmatter Erschliessung | 5/12 +1 | 7/12 +3 | $0 | 16 | Strip the YAML frontmatter. Hurts Grok-4 noticeably. |
What this shows. Grok-4 is roughly format-insensitive — ~13 variants all score 12/12. Qwen 2.5 7B is much more format-sensitive: its peak (10/12 onmicro-1k, compact-4k, csv-plus-*, labeled,table-normalized) is one above its score on the project's defaultdocling-easyocr (9/12). Compact-with-context formats help the open model materially. Floor cases — no-frontmatter, easyocr-raw, gpt4o-vision — hurt both, but hurt the open model more.
OCR computation cost — per-page
Two axes: USD per page (commercial API list price; $0 for local tools) and seconds per page (wall-clock on the project's M-series Mac). The two are not interchangeable — a free local tool can be the slowest of the bunch, and the cheapest paid API can be the fastest.
License and access
| Method | Vendor | Family | Access | $/page | s/page | Note |
|---|---|---|---|---|---|---|
| Docling + EasyOCR | IBM (Docling) + JaidedAI (EasyOCR) | pdf-parser | Open-source · runnable locally for free | $0 | 16 | Erschließung pipeline default. Force-full-page OCR mode (–force-reocr). |
| Docling + Tesseract | IBM (Docling) + Google (Tesseract) | pdf-parser | Open-source · runnable locally for free | $0 | 3 | Faster than EasyOCR variant on this hardware; comparable accuracy on V27 scans. |
| Tesseract 5.5 | Google (now community-maintained) | image-ocr | Open-source · runnable locally for free | $0 | 1 | Plain text + hOCR output. No table-structure recovery; needs a layout layer on top. |
| PaddleOCR / PP-Structure | PaddlePaddle (Baidu) | image-ocr | Open-source · runnable locally for free | $0 | 8 | Includes table structure recognition (SLANet). Strong on born-digital tables. |
| Surya | datalab.to | image-ocr | Open-source · runnable locally for free | $0 | 15 | OCR + layout + reading order + table recognition. Modern transformer-based. |
| Marker | datalab.to | pdf-parser | Open-source · runnable locally for free | $0 | 80 | PDF → markdown with reading-order tables. Slower but high-fidelity. |
| olmOCR (Qwen2-VL 7B) | Allen Institute for AI | multimodal-llm | Open-source · runnable locally for free | $0 | 480 | 7B vision LLM, GPU-recommended. ~5-10 min/page on CPU/MPS. Not yet tested in grid (transformers 5.x compatibility). |
| Mistral OCR | Mistral AI | pdf-parser | Commercial · paid API | $0.0030 | 1.3 | Dedicated OCR endpoint (mistral-ocr-latest). Returns per-page markdown; very fast. |
| GPT-4o Vision | OpenAI | vision-llm | Commercial · paid API | $0.0075 | 9 | Vision endpoint of GPT-4o. ~$0.005-$0.01/page depending on image size at 300 DPI. |
| Gemini 2.5 Pro Vision | vision-llm | Commercial · paid API | $0.0120 | 25 | Free tier exists but quota-limited; paid tier ~$0.005-0.025/page. Lowest rate limits of the 3 hosted vision LLMs. | |
| Pixtral 12B | Mistral AI | vision-llm | Commercial · paid API | $0.0010 | 25 | Open weights (Apache 2.0) but tested here via Mistral API. ~$0.0005-0.001/page at ~1.1K image tokens. |
| No OCR — direct vision | Grok-4 + Gemini 2.5 Pro + GPT-4o (avg) | vision-llm | Commercial · paid API | $0.0098 | 15 | Flagship vision LLM sees the raw page image + the question — no OCR step. Only meaningful for flagship rows (open models can't see images). |
| pdf-text-no-ocr | pdftotext (no OCR) | pdf-parser | Open-source · runnable locally for free | $0 | 0.1 | Embedded PDF text layer via pdftotext. On scanned V27/V35: mostly garbled control characters. On born-digital NOAA: clean text. |
| easyocr-raw | JaidedAI (no Docling) | image-ocr | Open-source · runnable locally for free | $0 | 3 | EasyOCR on the page image, no Docling layout/table reconstruction. Tests the OCR-only floor. |
| micro-1k | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | ~1.2 KB card variant — caption + table only. |
| compact-2k | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | ~2 KB card variant — caption + table + 1 paragraph. |
| compact-4k | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | ~4 KB card variant. |
| table-only | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | Table + caption only; no surrounding metadata. |
| labeled | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | Explicit per-section provenance labels. |
| csv-plus-headings | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | csv-only + section headings. |
| csv-plus-scope | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | csv-only + geographic+temporal scope. |
| csv-plus-paragraph | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | csv-only + one nearby paragraph (≤400 chars). |
| csv-plus-all-context | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | csv-only + headings + scope + paragraph. |
| csv-demerged | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | csv-only with merged rows split deterministically. |
| csv-normalized | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | csv-only with visual-confusable OCR normalization. |
| csv-normalized-rules | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | csv-normalized + explicit normalization rules in card. |
| table-normalized | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | table-only + visual-confusable normalization. |
| json-only | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | Table as JSON instead of CSV. |
| no-frontmatter | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | Strip the YAML frontmatter. Hurts Grok-4 noticeably. |
| prose | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | Table rendered as English prose instead of structured CSV. |
| stitched | Erschliessung | pdf-parser | Open-source · runnable locally for free | $0 | 16 | Multi-page tables stitched into one logical view. |