Our number, not a vendor number
Per-field extraction accuracy, published.
Marketing pages love to quote “95–99% accuracy.” That's a vendor benchmark — yours might be different. This page shows our own results from our own eval harness, on a variable test set of synthetic POs covering 8+ layout variants. The exact harness and the test set are in the public repo.
Benchmark run pending
We haven't published a number yet. The eval harness lives in src/moa/eval.py; run python -m moa.cli eval --count 200and copy the resulting report to web/public/eval/latest.json — this page will render it.
We'd rather show a blank than an invented number.
Methodology
The harness lives in src/moa/eval.py. Scoring rules:
- IDs (PO #, dates): exact match after normalization (case + whitespace; dates parsed to ISO).
- Customer name, ship-to: rapidfuzz ratio ≥ 0.85 / 0.85.
- Line items: SKU-exact pass first, then Hungarian assignment over description fuzz with floor 0.70.
- Per-sub-field: SKU exact, quantity/price numeric round-to-2, description fuzzy ≥ 0.80, unit exact.
Each PO renders in one of 8+ layout variants (clean table, nested header, dense, scan-style, multi-currency, sparse free-text body, handwritten annotations, multi-page long). To reproduce, clone the repo and run the CLI — the test set is checked in to eval/ground_truth.