GovTools
Evaluation mode
Evaluation mode · Reference / negative control

M2a — Raw Docling JSON mode

Model receives the raw decompressed docling.json.gz. Demonstrates why specialized evidence packaging is needed at all.

In one paragraph

Tests the failure mode that motivated the project: hand a model the IA-provided default Docling derivative (the JSON) and ask a question. V27's docling.json is 633 MB compressed, ~972 MB / 200 M+ tokens decompressed — overflows every model's context window including the reference frontier baseline. This is the negative-control mode that defines the floor.

How the inputs are generated

Generation · 01
Generator script
Docling library — produced during the initial pipeline conversion
Input sources
  • Docling docling.json.gz export
AI use
No — pure deterministic transformation
OCR / re-OCR
Inherits from Docling's extraction step
Tool: Docling --force-reocr
Approximate processing time
Docling conversion: 25+ min for OCR'd documents; model inference: errors out on payload size before producing a response.
Resource intensity
Very high — exceeds typical context windows
Determinism
Deterministic (same input → same output, byte-identical)
Introduced
Cycle 8, 2026-05-21.

Evaluation results

Diagnostic · 02
Relative to v0.6.1 baseline
0% pass rate across all models — establishes the floor that the pipeline's other derivatives must beat.

Related variants

Cross-reference · 06
← Back to all variants