Catching Duplicate Payments Before the Money Leaves the Building

The Situation

Invoice Reconciliation Engine is an accounts payable control system for finance teams that need purchase orders, goods receipts, and vendor invoices matched before payment leaves the business. It is needed when invoice volume makes manual spreadsheet checks too slow, too inconsistent, and too easy to miss duplicate or inflated payments.

A purchase order says 100 widgets at €9.90 each. The goods receipt confirms 100 widgets arrived on time. The vendor invoice lists 100 widgets at €10.10 each. Total invoice: €1,010. Total PO: €990. Twenty euros of price drift, invisible unless someone opens three spreadsheets and compares them by hand.

That's one line. A mid-market company processing 500 vendor invoices per month, each with 5 to 15 line items, generates 2,500 to 7,500 line-level comparisons every month. Finance teams do this work manually in Excel. They open the PO, open the goods receipt, open the invoice, eyeball the totals, and click approve. The work takes five to eight days per month of a senior AP clerk's time, and even with that effort, small overcharges slip through because the clerk's attention fails before their patience does.

The result is the phenomenon that accounts payable managers call "paid before matched." The invoice went out because the payment deadline was tighter than the review cycle. The overcharge surfaced during quarterly reconciliation. By then the payment had cleared two months earlier, and clawing it back meant a formal dispute inside the vendor's refund window.

The Cost of Doing Nothing

Industry benchmarks put duplicate payment rates at 0.5% to 1% of total accounts payable spend for companies still relying on manual matching. For a mid-market company with €5 million in annual AP spend, that's €25,000 to €50,000 in duplicate and overcharge payments every year, most of which are recoverable only if caught within 90 days. Past that window, most vendors refuse the refund claim.

Labor cost sits on top. Five to eight days per month of manual three-way matching, at a loaded cost of roughly €400 per day for a senior AP clerk in Western Europe, adds €24,000 to €38,000 annually. Plus the supervisor time spent reviewing exceptions, approving escalations, and chasing vendors about wrong amounts.

Combined, a mid-market company with an unautomated AP function carries €50,000 to €90,000 per year in recoverable labor and leaked cash before the cost of delayed close, vendor-dispute overhead, or the audit exposure from an approval trail that lives in email.

The invisible cost: no structured evidence of who approved what. When an auditor asks "why was this invoice paid at €10.10 when the PO was for €9.90?", the answer lives in one person's memory of a Tuesday afternoon conversation.

What I Built

A headless Kotlin service that ingests vendor invoices through three channels (REST API, CSV import, or accounting-platform webhooks from QuickBooks and Xero), matches each invoice line against its purchase order and goods receipt, scores the match with a confidence value, classifies any gaps as typed discrepancies with severity, and routes the result through a three-tier approval queue. What took five days a month happens in under 500 milliseconds per invoice.

The hardest part was not the matching logic. The hardest part was accepting that the three documents will never agree exactly, and building a system that makes the disagreements useful. Purchase orders get issued on Monday, received on Thursday, and invoiced two weeks later with a revised price the buyer agreed to by email. The engine has to hold all three versions of the truth and tell the operator which differences matter.

The first version of the matcher I built was exact-equality only. It handled 60% of real invoices and rejected the rest as "unmatched." Every rejected invoice still needed human review, which defeated the purpose. The rebuild turned matching into a five-strategy cascade, each strategy with its own confidence ceiling, each one catching failures the previous one missed. That change moved coverage from 60% to 98% without weakening the confidence of the easy cases.

System Flow

Architecture diagramScroll on small screens

Data Model

Architecture diagramScroll on small screens

Architecture Layers

Architecture diagramScroll on small screens

The Decision Log

Decision	Alternative Rejected	Why
Confidence-scored matching with a JSONB breakdown per invoice	Binary matched/unmatched flag	A 0.73 score needs to be debuggable. Operators see which component dragged the score down (PO, line, receipt, or price) and fix on the right axis. A boolean hides the reason.
Integer cents (`BIGINT`) for all money, fractional decimals only for quantities	`NUMERIC` or `DECIMAL` money columns	Rounding ambiguity kills financial systems. `19999` as integer cents never drifts. The only place decimals appear is quantity columns, where kilograms and liters need four decimal places.
Five PO resolution strategies with decaying confidence (1.0 to 0.50)	Single fuzzy-match algorithm over all POs	Fuzzy-first occasionally matches the wrong PO with no way to know. Exact-first captures easy cases at confidence 1.0 and constrains the search space before weaker strategies run.
Per-tenant HMAC secrets stored in `tenants.settings.webhook_secret` JSONB	One global `WEBHOOK_HANDLER_SECRET` env var	One shared secret is one blast radius. Per-tenant secrets let any tenant rotate independently and eliminate the cross-tenant leak path entirely.
Two auto-approve paths: small amount (`< €100`) OR high confidence (`≥ 0.95`) with zero discrepancies	Single confidence threshold	Small invoices shouldn't queue for human review. High-confidence large invoices shouldn't wait either. Two paths capture both cases without letting one override the other.
Append-only `audit_log` with no `UPDATE` or `DELETE` permitted	`updated_at` / `updated_by` columns on the main tables	An audit row that can be edited proves nothing to a regulator. Separate immutable events per decision give structural proof the trail was not altered.
Matching run capped at 100 invoices by default, scheduler every 30 minutes	Process everything queued per run	A 10,000-invoice run holds a distributed lock too long and fails halfway through with no recovery. Batched runs make partial progress safe and predictable.

Ecosystem Integration

This service does not run alone. Discrepancy events flow to a compliance ledger over NATS JetStream as invoice.discrepancy.detected messages, which the ledger commits as immutable audit events with cryptographic sequence. Approval requests for standard and escalated invoices trigger workflows on a self-hosted workflow orchestrator that handles routing, retries, and escalation timers. Notification envelopes (approval requested, invoice overdue, high-discrepancy rate, match-run failed) route through a notification hub with per-tenant preferences. Matched invoice and PO pairs feed into a transaction reconciliation engine for cross-system financial verification, and invoice documents push to a RAG platform for natural-language search. Inbound accounting-platform webhooks (QuickBooks, Xero) arrive through the webhook ingestion engine as signed fan-out deliveries.

All five outbound integrations are feature-flagged. The engine runs standalone with no ecosystem dependencies.

Results

The smoke test suite runs 48 endpoint paths against a live server and validates the full lifecycle: tenant registration, vendor and PO creation, goods receipt posting, invoice ingestion, match execution, discrepancy classification, approval routing, and rate-limit behavior. 1,028 unit and integration tests cover the matching algorithm, tenant isolation, duplicate detection, and approval state transitions.

On the business side, the shift looks like this. Manual three-way matching on a 500-invoice-per-month AP function drops from 5 to 8 days of clerk time to a handful of exception reviews. Duplicate payment rates fall from 0.5-1% of AP spend toward the industry-leading band under 0.1%. A €5 million AP spend recovers €20,000 to €40,000 per year that would previously have leaked out as overcharges, with most of the remaining exceptions catching price drift within 24 hours instead of 90 days.

The first version of the matcher shipped in two weeks and matched 60% of real invoices. The rebuild with the five-strategy cascade took another week and moved coverage to 98%. The design choice that separates those two numbers is visible in the JSONB breakdown on every match row: every score comes with a reason, every low-confidence match routes to human review, and no invoice ever disappears into an unmatched state without an operator seeing it. That's the difference between an automated matcher and a trustworthy one.

Invoice Reconciliation Engine

Catching Duplicate Payments Before the Money Leaves the Building

The Situation

The Cost of Doing Nothing

What I Built

System Flow

Data Model

Architecture Layers

The Decision Log

Ecosystem Integration

Results

Put this system in context.

Contents