~/About~/Foundry~/Blueprint~/Journal~/Projects
Book a Call
Blueprint

Financial Compliance Ledger

·7 min read·Kingsley Onoh·View on GitHub

Architectural Brief: Financial Compliance Ledger

Financial compliance systems live or die on one property: immutability. If the audit trail can be edited, it proves nothing to an auditor. That requirement shaped every decision in this system, from the database schema (append-only event table, no UPDATE, no DELETE) to the deployment topology (single binary, six concurrent goroutines, feature-flagged integrations that fail silently).

The Financial Compliance Ledger is an event-sourced backend service that ingests financial discrepancy events from upstream reconciliation systems, enforces tenant-scoped escalation policies, tracks resolution workflows through a six-state machine, and generates compliance reports as PDF or HTML. Built with Go 1.22, Chi router, PostgreSQL 16, and NATS JetStream, it runs as a single binary on a VPS behind Traefik.

System Topology

Infrastructure Decisions

  • Language: Go 1.22 with Chi v5.2.5 router. Chose over Python (FastAPI) and Node.js (NestJS) because the server runs six concurrent background goroutines alongside the HTTP handler: NATS consumer, escalation engine, notification retrier, RAG syncer, metrics collector, and report cleaner. Go handles this as goroutines in a single binary with shared memory. Python would need Celery or multiprocessing with inter-process communication. Node would need worker threads or separate processes behind a queue.

  • Database: PostgreSQL 16 with pgx v5.9.1 driver and golang-migrate v4.19.1 for schema management. Chose over MongoDB because the core requirement is append-only INSERT with ACID transaction guarantees. The ledger_events table must never accept UPDATE or DELETE. PostgreSQL's strong transaction model means the mutable projection (discrepancy status) and the immutable event can be written atomically. MongoDB's document model makes cross-collection transactions more complex and immutability harder to enforce structurally.

  • Event Streaming: NATS JetStream with durable pull consumers and manual acknowledgment. Chose over Kafka because the production deployment allocates 256MB to the message broker. Kafka's minimum viable deployment requires 1-2GB for ZooKeeper plus broker. NATS runs as a single process, provides durable subscriptions with at-least-once delivery, and includes dead-letter handling after 3 failed delivery attempts via msg.Term().

  • Report Generation: wkhtmltopdf with Go's html/template and embed.FS for template embedding. Chose over pure Go PDF libraries (go-pdf, gofpdf) because compliance reports need styled tables, conditional formatting, and section headers. HTML templates give layout flexibility without coordinate-based PDF positioning. The system falls back to serving raw HTML when wkhtmltopdf is unavailable, which simplifies development and testing.

  • Metrics: Prometheus client library v1.23.2 with 5 custom instruments: ledger_events_total (counter by event_type), discrepancies_total (gauge by status and severity), notifications_sent_total (counter by channel and status), escalation_actions_total (counter by action), and reports_generated_total (counter by report_type). Chose over structured log aggregation because Prometheus provides dimensional data natively and integrates with Grafana for real-time compliance dashboards.

  • Authentication: API key in X-API-Key header, SHA-256 hashed and looked up against the tenants table. In-memory cache with 5-minute TTL avoids hitting the database on every request. Chose over JWT because the system is API-to-API (service authentication, not user sessions). API keys are simpler to rotate and revoke.

Constraints That Shaped the Design

  • Input: Discrepancy events arrive via NATS JetStream on the recon.discrepancy.detected subject. Each event carries a tenant_id, external_id, severity, expected and actual amounts as DECIMAL(15,2), and a JSONB metadata payload. The consumer validates the tenant (must exist and be active), deduplicates by (tenant_id, external_id), and creates both a discrepancy record and an immutable discrepancy.received event in a single operation.

  • Output: 22 HTTP endpoints serving discrepancy queries (filtered, cursor-paginated), workflow actions (acknowledge, investigate, resolve, add note), escalation rule CRUD, report generation, tenant registration, health checks, and Prometheus metrics. Reports generate asynchronously (202 Accepted) and serve as PDF or HTML downloads.

  • Scale Handled: Designed for low-to-mid volume compliance operations: hundreds of discrepancies per tenant per day, not tens of thousands. The 15-minute escalation polling cycle processes tenants sequentially. At higher tenant counts, the evaluation would need sharding by tenant or switching to event-driven triggers.

  • Hard Constraints: Immutability on ledger_events (INSERT only, enforced at the application layer). Tenant isolation on every query (middleware injects tenant_id, every SQL WHERE clause includes it). Rate limiting at 60 requests per minute per IP. Request body capped at 1MB. HTTP server timeouts: 15 seconds read/write, 60 seconds idle, 30 seconds graceful shutdown.

  • Deployment Budget: Production containers capped at 512MB for the application, 512MB for PostgreSQL, and 256MB for NATS. Total memory footprint under 1.5GB including Traefik. Multi-stage Docker build produces a stripped binary (CGO_ENABLED=0, -ldflags "-s -w") running as a non-root user on Alpine 3.19.

  • Concurrency Discovery: The escalation engine's concurrent rule evaluation against the same discrepancy produced duplicate closure events under timestamp-based ordering. Moving the transition check inside a row-locked transaction with SELECT ... FOR UPDATE eliminated the race. The surprise: concurrent human analysts operating on the same discrepancy through the API generated more conflicts than the scheduled batch engine, which led to the optional optimistic locking via expected_sequence.

Decision Log

Decision Alternative Rejected Why
Append-only event table (INSERT only, BIGSERIAL ordering) Mutable audit columns (updated_at, updated_by) An auditor needs to verify that no record was altered post-facto. Mutable columns can't provide this guarantee because the columns themselves are writable. BIGSERIAL ordering makes tampering structurally detectable: any gap in the sequence or any missing event breaks the chain.
Escalation rules as tenant-scoped database rows Hardcoded escalation logic in Go Each tenant has different compliance policies. Storing rules as rows (severity match, time threshold, action) lets tenants configure their own escalation without code deployments. The wildcard severity match (*) supports catch-all rules.
NATS JetStream with dead-letter after 3 attempts Kafka with infinite retry 256MB memory budget. Kafka needs 1-2GB minimum. Dead-letter after 3 attempts via msg.Term() prevents a single malformed event from blocking the consumer indefinitely.
Feature-flagged outbound integrations Mandatory service dependencies Audit trail integrity can't depend on downstream availability. Both integrations sit behind environment flags defaulting to off. If the Notification Hub goes down, escalation rules still fire and events still record. Outbound HTTP calls use retry queues, not synchronous gates.
Single Go binary with 6 goroutines Microservice split (API + worker services) All goroutines share the database connection pool and run in one process. No inter-service communication, no deployment coordination, no message queue between API and workers. One binary to deploy, one process to monitor.
Async report generation (202 Accepted) Synchronous report endpoint Compliance reports can span 10,000 events across 100 discrepancies. HTML rendering and PDF conversion via wkhtmltopdf takes seconds to minutes. A synchronous endpoint would exceed the 15-second write timeout.
Optional optimistic locking via expected_sequence Mandatory sequence validation on all workflow actions Backward compatibility. Simple integrations that process discrepancies one at a time don't need sequence tracking. Clients that handle concurrent access include the expected sequence number and receive 409 Conflict on mismatch. The state machine catches illegal transitions regardless.
#go#postgresql#nats-jetstream#event-sourcing#compliance

The complete performance for Financial Compliance Ledger

Get Notified

New system breakdown? You'll know first.