Input variants — how evidence is shaped for the model
Every form the open model can be handed. Each entry links to inputs, generator script, AI use, OCR tool, processing time, and results.
Pipeline versions
01Per-table card variants
02- Evidence-Preserving Table Normalization Layer— Five-layer deterministic OCR-confusable + column-type + table-context + authority normalizer with per-cell provenance. Repairs 444 cells across 123 cards in the V27/V35 corpus; 0 cells touched on born-digital NOAA.ExperimentalBest 10/13 (Qwen2.5-7B, Granite-3.3-8B, Llama-3 8B)
- CSV-only card— Table data rendered as raw CSV inside a Markdown code block. The most-effective open-tier variant.RecommendedBest 11/13
- CSV-only with deterministic header normalization— CSV cards with column headers cleaned by deterministic rules (no LLM). Addresses cycle-17 CELL_READ_ERROR failures from malformed Docling headers.Experimental—
- CSV with row-de-merge— CSV cards with fused multi-row records split back. Addresses OCR-merged rows like the cyprid 'April May,1491 35,...' case.ExperimentalBest 10/13 (Qwen2.5-7B)
- CSV with multi-page table stitching— Multi-page tables reunited across the PDF page break. Closes Q-NOAA-CALC-001 for frontier-tier reference; open models still need stronger arithmetic to use it.ExperimentalBest 9/13 (Qwen2.5-7B, Granite-3.3-8B)
- Micro card (≤1K tokens)— Smallest viable card: caption + inline Markdown table + 6-line YAML. Targets the 4K-context open-model tier.ExperimentalOpen 40%
- Table-only card— Just the Markdown table + 1-line caption. No frontmatter, no metadata.ExperimentalOpen 52%
- Labeled-faithfulness card— Full v0.6.1 card with every section explicitly tagged by provenance type (verbatim-cells, inherited, summarized, inferred).ExperimentalOpen 49%
Document-level maps
03- Document table-of-contents map— One card per document listing every detected table with caption, page, and dimensions. Enables two-shot retrieval.ExperimentalBest 5/13 (Qwen2.5-7B, Granite-3.3-8B with M3-IDX two-shot mode)
- Enriched document-index map— Per-document TOC with column headers + sample row labels + auto-detected scope. Designed to close the retrieval gap on tables with missing or uninformative captions.Experimental—
- Multi-page continuation map— Per-document map of which tables continue onto the next PDF page.Experimental—
Evaluation modes
04- M3-L4 — Oracle retrieval mode— Model receives exactly one pre-selected card per question. Isolates 'can the model answer given perfect retrieval?'Recommended—
- M3-HYDE — Vector retrieval (HyDE) mode— Vector similarity over pre-embedded cards. A small independent model writes a hypothetical answer; that answer is embedded and matched against the corpus. Works for any model size — no TOC navigation required.Experimental—
- M3-IDX — Two-shot retrieval mode— Model picks a table from a per-document index, then receives that table. Tests retrieval + reading together; isolates the cost of removing the oracle.Experimental—
- M2c — Docling Markdown mode— Model receives Docling's full linearized Markdown for the entire document.Experimental—
- M2a — Raw Docling JSON mode— Model receives the raw decompressed docling.json.gz. Demonstrates why specialized evidence packaging is needed at all.Reference—
- M3-AC — All-cards mode— Model receives every card in a document concatenated. Tests retrieval-without-oracle.Experimental—