From Tracking Exceptions to Owned Claims

The Situation

Returns & Claims Orchestration Engine is a logistics operations console for e-commerce teams that need to turn failed deliveries, returns, and carrier exceptions into owned claims cases.

Teams need it when tracking tools say what happened but not who owns the next action. A parcel comes back. A carrier marks an exception. A customer asks for a refund. The tracking timeline has facts, but the support team still needs evidence, deadlines, carrier packets, customer actions, and audit history.

The project fills the gap after the Delivery Gateway. That gateway emits normalized tracking:events; this engine decides what those events mean for operations. It also works without that gateway. A tenant can register, create shipments and claims by hand, attach evidence, queue resolution actions, generate CSV or JSON export packets, and close the case with an audit trail.

The operating target is concrete: ingest exception events, keep large queues usable, and open claim detail pages without making operators wait.

The Cost of Doing Nothing

Without a claims owner, exception handling becomes a shared inbox problem. The carrier timeline exists, but ownership is tribal: someone remembers which evidence DHL wants, someone else knows whether DPD needs a customer message, and one support agent keeps a spreadsheet of refund promises.

A small e-commerce operations team spending about 2.5 hours a day chasing carrier exceptions burns roughly €32K a year in labor before counting missed reimbursement windows. That estimate assumes one operations specialist at a conservative European fully loaded cost and normal working days. It does not include customer churn, double refunds, or disputed carrier reimbursements.

The real loss is not only time. It is drift. An exception without an owner can sit until a customer complains again. Evidence goes missing. Carrier deadlines pass. Refunds happen without reimbursement packets. By the time finance asks why margin leaked, the claim history is scattered across tracking pages, support notes, and exported files.

What I Built

I built a Spring Boot 3.4 system that turns exception signals into tenant-scoped claim cases. The core workflow is simple for an operator: create or ingest a shipment, open the claim, follow the evidence checklist, queue the action, generate the carrier packet, and close the case.

The hard part was making that simple path survive retries and optional integrations. Delivery Gateway events can arrive twice. Redis can hold a pending message after a failed poll. A downstream notification service can be off. A tenant may run the system entirely by hand. The app cannot make any of those conditions fatal.

So the business state lives in PostgreSQL. Redis is an input and coordination surface, not the source of truth. Notification Hub and Workflow Engine dispatch from database-backed queues. Dispute Workbench gets exceptions only when enabled. Carrier policy stays behind a registry so controllers and templates do not learn DHL, DPD, GLS, or manual-carrier rules.

System Flow

Architecture diagramScroll on small screens

Data Model

Architecture diagramScroll on small screens

Architecture Layers

Architecture diagramScroll on small screens

The Decision Log

Decision	Alternative Rejected	Why
Java 21 and Spring Boot 3.4	Node or Python service	The project needed enterprise-style transactions, validation, scheduled jobs, metrics, migrations, and tenant security in one backend.
Server-rendered operations UI	Separate React frontend	The workflow is form-heavy and state-heavy. Server-rendered HTMX kept queue and detail screens close to the domain model.
Carrier export packets	Direct carrier claim submission	The MVP has no reliable carrier sandbox credentials. Export packets give operators usable output without faking carrier portal automation.
Redis ack after database commit	Ack while processing each message	One bad event in a batch could roll back PostgreSQL after Redis had already cleared a good message. Commit first, ack second.
Root-claim detail read model	Controller composing five separate reads	The claim detail page had to stay under 500 ms. One tenant-scoped read model replaced a waterfall.
Feature-flagged ecosystem edges	Required external services	The engine must run standalone. Delivery Gateway, Notification Hub, Workflow Engine, and Dispute Workbench are optional by tenant.

Ecosystem Integration

This project gives the Delivery Tracking Gateway its first downstream operational consumer. The gateway says a parcel failed, returned, or hit an exception; this engine turns that signal into a claim with ownership and deadlines. Claim lifecycle events can flow into the notification hub, deadline and resolution actions can trigger the workflow engine, and rejected cases can be escalated into the dispute workbench. All integrations are feature-flagged, and the system runs standalone with no ecosystem dependencies.

Results

The shipped system proves the full standalone path: tenant registration, manual shipment and claim creation, auto-case creation from exception events, evidence checklist generation, idempotent resolution actions, carrier CSV and JSON exports, audit history, and operational dashboards.

Validation numbers matter because they describe operations pressure, not benchmark theater. The ingestion test processes 50 tracking events per second locally. The claims queue renders 10,000 cases in under 1 second. The claim detail page stays under 500 ms excluding downloads after the read model stopped materializing bulk history before first paint.

Business result: a closed loop. A tracking event no longer stops at "something happened." It becomes a case with a deadline, owner, evidence, action, export, and audit record. At higher volume, the worker partitioning and timeline pagination would need more work, but the operating model can stay the same: tracking reports the fact, claims orchestration owns the consequence.

Returns & Claims Orchestration Engine

From Tracking Exceptions to Owned Claims

The Situation

The Cost of Doing Nothing

What I Built

System Flow

Data Model

Architecture Layers

The Decision Log

Ecosystem Integration

Results

Put this system in context.

Contents