Explainable Dispatch Optimization Engine

The Situation

Explainable Dispatch Optimization Engine is a dispatch control system for field-service operators that assigns mobile technicians while preserving the commitments a dispatcher has already accepted.

Teams need it when a manual board has to juggle skills, travel distance, working hours, SLA clocks, emergency priority, overtime, churn risk, and schedule changes at the same time. The painful part is not only picking the next technician. It is proving why that technician was chosen, why another one was rejected, and why accepted work did not get moved during a replan.

I built it for the messy middle between manual dispatch and blind optimization. The system imports technicians, skills, jobs, and travel data, builds a tenant-scoped plan, stores explanations, and gives supervisors a review surface for approvals, overrides, freezes, replay, and audit history. A black-box optimizer can look impressive until it silently moves work that a technician already accepted. This engine treats accepted, completed, and frozen assignments as operating promises.

The Cost of Doing Nothing

The repo does not contain customer revenue or labor data, so I will not pretend it does. The operational cost is still visible in the domain model. The system tracks SLA lateness, emergency lateness, travel minutes, overtime minutes, idle time, and churn moves. Those are the places manual dispatch leaks money.

A single extra dispatcher hour per day at an illustrative UK operator-cost band of roughly £28 to £42 per hour would cost about £7.3k to £10.9k per year per seat. That is an assumption for this dispatch model, not a measured project result. The code-backed cost is safer to state: unexplained assignments produce late jobs, overtime, travel waste, churn, and rework nobody can replay later.

What I Built

At the core is a Scala 3.6.4 Akka HTTP application backed by PostgreSQL 16 and Redis 7. OR-Tools 9.11 runs behind a SolverAdapter, not inside the domain services. PostgreSQL stores tenants, users, technicians, skills, jobs, travel entries, plans, assignments, overrides, approvals, replay runs, objective settings, and outbox events.

For this spec, the configured planning boundary is 500 jobs with a normal 90-second solver timeout. Emergency replan timeout is 30 seconds. Replay windows are capped at 31 days. CSV imports are capped at a source-file validation limit of 50,000 rows and 5,242,880 bytes. The benchmark target covers 500 jobs across 50 technicians under 90 seconds, a 100-job emergency replan under 30 seconds, and the same 50,000-row travel-matrix validation limit under 2 minutes.

The first hard lesson came from the solver boundary. OR-Tools could produce a valid optimized answer that assigned fewer jobs than the deterministic dispatcher. That was not acceptable for an operations board. The solver now has to meet the deterministic feasible assignment count before its cost improvements matter.

System Flow

Architecture diagramScroll on small screens

Data Model

Architecture diagramScroll on small screens

Architecture Layers

Architecture diagramScroll on small screens

That matters because the project is not selling an invisible AI promise. It proves a bounded operating loop: import the work, build constraints, solve, explain, freeze, approve, replay, and audit.

The Decision Log

Decision	Alternative Rejected	Why
Standalone CSV import and local solving first	Mandatory workforce-suite integration	A team can prove dispatch value before attaching the system to the rest of its stack.
OR-Tools behind `SolverAdapter`	Domain code coupled directly to OR-Tools	Native solver failure, timeout, fallback, and replay metadata remain visible and testable.
Frozen assignments as hard constraints	Re-optimizing the whole board every run	Automation cannot undo dispatcher-approved work without an explicit operator action.
Persisted explanations and replay metrics	Trusting solver output as a black box	Supervisors can inspect chosen, rejected, and unscheduled work after the plan is created.
Feature-flagged Notification Hub and Workflow Engine	Mandatory outbound services	Dispatch planning runs when notifications or approval workflows are disabled.
Dispatcher, supervisor, and auditor roles	One shared admin role	Execution, approval, and review stay separated inside the same tenant.

Why the approvals exist

Some plans need a human gate. The config includes a high-cost replan threshold of 100.00, and the domain model records approval reasons such as overtime, high-cost replan, and manual override. The system does not hide expensive change behind a green checkmark. It routes it into an approval path that can be local or sent to the Workflow Automation Engine when that integration is enabled.

That is the kind of product choice buyers feel later. A solve that looks cheaper on paper may be expensive if it churns technician schedules, creates overtime, or moves a customer into SLA risk. Approval gates make those tradeoffs visible before the board becomes an operating promise.

Replay as the trust layer

The replay service compares optimizer output against a greedy baseline using the same input snapshot. It reports SLA hit rate, travel minutes, overtime minutes, churn count, unscheduled jobs, and solve time. That gives operators a way to ask a better question than "did the solver run?" They can ask whether the solver improved the board in the dimensions that matter.

Replay also protects future changes. If objective weights change, a team can compare the new behavior against a baseline instead of discovering late jobs in the field.

Ecosystem Integration

Dispatch events can move through the Event-Driven Notification Hub, while overtime and high-cost approvals can run through the Workflow Automation Engine. The dispatch core emits local outbox events such as dispatch, assignment, and replay activity, then background jobs can publish or poll external systems.

Those connections are useful but deliberately non-blocking. The Notification Hub and Workflow Engine integrations are feature-flagged, and the dispatch engine runs standalone with no ecosystem services enabled.

Results

The shipped repository proves a bounded, explainable dispatch core rather than a live production claim. The validation evidence is before and after replay, not customer revenue: before optimization, the deterministic path establishes the feasible assignment count and baseline metrics; after optimization, the OR-Tools plan must match or exceed that assignment count before travel, overtime, SLA risk, churn, and solve time are allowed to matter.

It also proves the operating envelope: 500 jobs across 50 technicians under the 90-second benchmark target, 100-job emergency replans under the 30-second target, the source-file travel-matrix validation limit checked under 2 minutes, frozen work treated as a hard constraint, and replay metrics stored for comparison against the deterministic baseline.

The business value is not that the optimizer is clever. The value is that a dispatcher can trust what it did, a supervisor can approve expensive changes, and an auditor can replay the decision later.

Explainable Dispatch Optimization Engine

Explainable Dispatch Optimization Engine

The Situation

The Cost of Doing Nothing

What I Built

System Flow

Data Model

Architecture Layers

The Decision Log

Why the approvals exist

Replay as the trust layer

Ecosystem Integration

Results

Put this system in context.

Contents