Transaction Reconciliation Engine: Blueprint
Problem Statement
A payment gateway says a customer paid €49.99 on Tuesday. The bank statement shows €49.99 on Wednesday. The internal ledger has €49.99 with a different reference number. Three sources, one transaction, zero certainty they describe the same event.
Manual reconciliation at scale is an accounting staff problem disguised as a data engineering problem. Humans open spreadsheets, visually scan columns, and match rows by pattern recognition. It works until transaction volumes cross a few hundred per day. After that, unmatched transactions accumulate, reporting deadlines slip, and the finance team stops trusting the numbers entirely.
Core Architectural Decision
The engine treats reconciliation as a scoring problem, not a lookup problem. Instead of rigid exact-match queries that break the moment a date shifts by one day or a reference ID changes format, it evaluates every gateway–ledger pair through a 4-rule cascade and assigns a confidence score between 0.0 and 1.0.
The rules fire in parallel against each pair, and the highest-confidence match wins:
| Rule | Confidence | What It Checks |
|---|---|---|
| Exact Match | 1.00 | Amount + currency + date + counterparty |
| Amount + Date | 0.90 | Amount + currency + date within tolerance |
| Reference Match | 0.80 | External ID substring in the other's description |
| Fuzzy Amount | 0.75 | Amount within percentage tolerance + relaxed date |
A minimum confidence threshold (configurable per deployment) gates what gets persisted as a match. Everything below the threshold becomes a discrepancy.
System Topology
Infrastructure Decisions
- Runtime: Go as a static binary. Chose it over Python or Node because the matching loop is CPU-bound and deploys cleanly as a scratch container with no runtime dependency tree.
- Data layer: PostgreSQL owns transactions, matches, discrepancies, and run history. Chose it over Redis-only storage because reconciliation records need durable audit history, not just fast lookup.
- Cache and locks: Redis handles hot-path deduplication and distributed lock ownership. Chose it over database locks for scheduler coordination because lock state is temporary and TTL-bound.
- Source boundary: Adapter interfaces isolate Stripe, PayPal, and CAMT.053 parsing. Chose adapters over source-specific reconciliation branches because every source must produce the same canonical transaction shape before matching.
- Money representation:
int64cents in the domain model. Chose it over floats or decimal objects because the engine only needs smallest-unit arithmetic and cannot tolerate rounding drift.
Why Go
The engine is CPU-bound during the scoring phase: O(n x m) pair evaluations where n is gateway transactions and m is ledger transactions. Go's goroutine scheduler, zero-cost interface dispatch, and value-type structs keep allocation pressure low during the inner loop. The entire scorer runs without a single heap allocation per pair evaluation.
The binary compiles to a single static executable. No runtime, no dependency tree at deploy time. The Docker image is a scratch container with one file.
Constraints That Shaped the Design
Money in cents, never floats. Every amount is stored as int64 representing the smallest currency unit. The parseCents function in the PayPal adapter uses math.Round(f * 100) to avoid floating-point truncation errors. The domain model enforces this: there is no float64 amount field anywhere in the transaction struct.
Deduplication is a two-tier write. Redis SETNX checks first (24-hour TTL). If Redis is unavailable, PostgreSQL ON CONFLICT DO NOTHING on the dedup_key column handles it. The dedup key is SHA-256 of source_id + external_id. This means the system tolerates Redis outages without admitting duplicates.
Distributed locking is owner-verified. The Redis lock uses SETNX for acquisition and a Lua script for release that atomically checks GET key == owner before DEL. This prevents a slow process from unlocking a lock that was already acquired by another instance after TTL expiry.
Decision Log
| Decision | Alternative Rejected | Why |
|---|---|---|
| Confidence-scored cascade | Exact lookup by reference | Gateway references, settlement dates, and bank descriptions disagree often enough that exact lookup creates false discrepancies. |
| PostgreSQL as correctness layer | Redis-only deduplication | Redis can restart cold. The unique dedup_key constraint is the durable duplicate barrier. |
| Owner-verified Redis release | Plain DEL on unlock |
A slow job can outlive its TTL. The release path must prove it still owns the lock before deleting it. |
| Adapter boundary per source | Source-specific engine branches | Stripe, PayPal, and CAMT.053 differ at the edge. The scoring engine should only see canonical transactions. |
int64 cents |
Float amounts | A reconciliation engine cannot create one-cent differences while trying to resolve one-cent differences. |
What Stays on the Roadmap
The architecture has two open extension points that I designed for but did not implement.
First: streaming ingestion. The current pipeline is pull-based. Each adapter fetches a time window and pushes rows into PostgreSQL. A webhook receiver that ingests Stripe events in real time would eliminate the polling interval entirely. The SourceAdapter interface accepts this without changes because FetchTransactions and a hypothetical HandleWebhookEvent both produce the same IngestRequest.
Second: a dashboard API. The ReconciliationRun record already captures MatchRate, DurationMs, and DiscrepancyCount per run. Exposing these over an HTTP endpoint would give finance teams a live view of reconciliation health instead of waiting for CLI output. The data already exists; the missing piece is a thin HTTP handler.