Architectural Brief: SLA Penalty Settlement Engine
Supplier contracts already say what happens when an SLA is missed. The weak point is the handoff from breach evidence to collectible money: rules need to calculate the penalty, ledgers need to mirror both sides, disputes need to unwind the charge, and settlement PDFs need to stay legally stable after tenant details change.
This system keeps calculation pure, persistence explicit, and side effects delayed until the outbox can prove what happened.
System Topology
Infrastructure Decisions
- Compute: ASP.NET Core 8 with Giraffe. Chose it over a separate API and SPA because the operator workflow is form-heavy, tenant-scoped, and better served by one server-rendered process than by a second frontend build pipeline.
- Domain language: F# 8. Chose it over C# because penalty types, ledger directions, breach states, and no-penalty outcomes map cleanly to discriminated unions. The repo proves this in
src/Slapen.Domain/Types.fs, where invalid currency, missing units, and mismatched ledger pairs become typed errors. - Data layer: PostgreSQL 16 with explicit SQL through Npgsql and Dapper.FSharp-style repositories. Chose it over EF Core because the important write path is an append-only ledger and settlement membership table, not object graph persistence.
- Background work: Hangfire 1.8 with PostgreSQL backing plus an in-process outbox processor. Chose it over direct HTTP calls from request handlers because Invoice Recon, Hub, Workflow, and VPI calls need leases, retries, and idempotency keys.
- Cache and rate window: Redis 7. Chose it over storing ephemeral counters in Postgres because rate limits and short-lived identity caches do not belong in the settlement ledger's transaction path.
- PDF rendering: QuestPDF. Chose it over Chromium screenshots because credit notes render from a stored snapshot object, not from a browser page that can change with CSS or tenant settings.
- Deployment boundary: Docker Compose with Traefik labels and a GHCR image. Chose it over a managed platform because the service needs to join the shared
ecosystemnetwork and run migrations on container boot. The compose label targetsslapen.kingsleyonoh.com, but registry and curl checks show it is not live.
Constraints That Shaped the Design
- Input: Breaches arrive through manual entry, fixed-schema CSV upload, Contract Lifecycle NATS/REST ingestion, or HMAC-verified Hub ingress. That mix forced a common
ExternalBreachInputshape before any accrual logic runs. - Output: The system produces paired
penalty_ledgerrows, settlement records, snapshot JSON, local PDF credit notes, and optional outbox work for ecosystem services. - Scale Handled: The load-audit script seeds
10000breaches and5000settlements in full-volume mode and asserts the dashboard summary stays under500msthrough one bounded aggregate command. - Hard Constraints: The ledger is insert-only. Migration
008__penalty_ledger_insert_only_trigger.sqlraises on update and delete, whileLedgerPair.createrejects mismatched amount, period, context, direction, kind, and compensation references before rows reach SQL. - Tenant Boundary: API keys resolve through prefix plus SHA-256 hash. Every repository call receives
TenantScope, and cross-tenant reads are tested as 404 behavior, not left as convention. - Integration Boundary: Every ecosystem client is disabled unless its enable flag, base URL, and API key are present. The engine still runs with manual entry, CSV import, and local PDF export.
Data Contracts
The core tables are intentionally boring. sla_clauses stores one of five penalty shapes. breach_events records what happened and the current dispute state. penalty_ledger records only positive amounts, with direction and entry kind carrying meaning. settlement_ledger_entries owns settlement membership so ledger rows never need a "settled" flag.
Decision Log
| Decision | Alternative Rejected | Why |
|---|---|---|
| Five typed penalty configs | Generic rules DSL | The PRD excludes a generic DSL, and the F# unions cover flat, percent, tiered, daily, and linear penalties without runtime scripting. |
| Paired ledger writer | Two independent inserts from handlers | LedgerPair.create validates both rows before any transaction writes, so a half-ledger cannot be produced by application code. |
| Settlement membership table | Mutating penalty_ledger with settlement status |
The ledger trigger blocks update and delete. A join table proves membership without weakening immutability. |
| Snapshot JSON for PDFs | Render credit notes from live tenant rows | Tenant names, addresses, or registrations can change after posting. Stored snapshot JSON keeps reprints stable. |
| Fixed CSV schema | Custom column mapper UI | The PRD excludes custom CSV mapping. Operators adapt the file, and ingestion stays deterministic. |
| Feature-flagged clients | Mandatory ecosystem services | Contract Lifecycle, Invoice Recon, Hub, Workflow, and VPI are useful when present, but the settlement engine must work standalone. |
| HMAC Hub ingress | Trusting tenant IDs in HTTP payloads | The Hub breach ingress verifies X-Hub-Signature before parsing the envelope, then resolves tenant scope from the signed body. |
| One dashboard aggregate command | Multiple UI count queries | Batch 011 moved the dashboard to DashboardRepository.summary, keeping the operational page measurable under seeded load. |
Scaling Limits
The current design is shaped for an operator console and portfolio-scale integration work, not for a public high-volume payments network. The dashboard audit exercises 10000 breaches and 5000 settlements. The PRD target for NATS ingestion is 100 breaches per second, and the success criteria still leave the staged 2s NATS-to-ledger and 30s Invoice Recon posting targets open.
The first limit is not F# calculation. It is settlement grouping and ledger query shape once ledger rows move beyond local audit datasets. At that point, the next change is partitioning or summary projections for penalty_ledger, while keeping the same domain contract: penalties calculate from inputs, rows append in pairs, and corrections write compensation instead of mutation.