Architectural Brief: Trade Compliance Classification Engine

Customs classification fails when a decision looks final but the evidence behind it cannot be reconstructed. I built this system around replayable proof: the product facts, rule pack version, reviewer action, and audit payload stay tied to the exact tenant and jurisdiction that produced the recommendation.

System Topology

Architecture diagramScroll on small screens

Infrastructure Decisions

Compute: Rust 1.78 with Axum and Tokio, chosen over a dynamic web framework because the classification path has to make failure modes explicit. A Python or Node service would have been faster to sketch, but the codebase needed typed config, typed errors, async route boundaries, and worker behavior that could be checked in tests rather than inferred from convention.
Data layer: PostgreSQL 16, chosen over a document store because customs evidence is relational. Tenants own users, API keys, products, rule packs, runs, jobs, overrides, exports, and integration settings. The schema also enforces uniqueness for product SKUs per tenant, rule pack versions per tenant and jurisdiction, and a single active pack for each tenant and jurisdiction. A document database would store the payloads, but it would not carry the same constraint pressure at the row boundary.
Job execution: PostgreSQL job leasing with FOR UPDATE SKIP LOCKED, chosen over Redis for this build because the job state, retry attempts, classification run, and audit payload already live in Postgres. Redis would add another moving part before the workload proves it needs separate queue throughput. The current worker defaults to 25 classification jobs per batch, a 30 second lease, a 1 second tick, and 10 audit exports per tick.
Rule runtime boundary: A deterministic rule-runtime contract, chosen over direct database heuristics because classification must produce selected and rejected candidates, confidence, risk, and explanation as evidence. The repository does not show a shipped wasmtime dependency, so I would not describe this as live WebAssembly execution. The implemented boundary still matters: validation runs with fuel and timeout limits, and classification evaluation uses bounded settings instead of unbounded application logic.
Search: In-memory Tantivy, chosen over SQL text search because product lookup needs tenant-filtered search documents while the source of truth stays in Postgres. SQL LIKE would be simpler, but earlier project evidence found drift from the intended search behavior. Tantivy gives a local index for reviewer workflow without turning search into the primary data store.
Review interface: Server-rendered reviewer workbench, chosen over a polished single-page app because the architecture proof is import, classify, review, and export. A SPA would add build tooling and client state before the compliance workflow is proven. The current UI boundary keeps reviewer actions close to the same tenant auth and policy matrix as the API.
Integration posture: RAG evidence lookup, Notification Hub, and Workflow Engine adapters are optional and disabled by default. I chose that over required ecosystem calls because customs classification must still run when an auxiliary service is absent or misconfigured. Adapter health is surfaced, but core import, review, and export are not held hostage by those dependencies.

Constraints That Shaped the Design

Input: Product rows arrive through CSV or API paths, but the import path refuses incomplete facts. The required shape includes SKU, name, description, country, jurisdiction, product type, materials, and intended use. That validation pushes missing evidence to the edge before a classifier can turn weak inputs into confident-looking outputs.
Rule source: Rule packs are versioned by tenant and jurisdiction. Activation is not a YAML upload alone. The gate checks syntax, rules, golden cases, runtime safety, and output coverage. At least 10 golden cases are required before activation, which is a guard against promoting a rule pack that parses but cannot defend its own behavior.
Classification run size: A request is capped at 100 product IDs. I would keep that limit until benchmark evidence proves larger batches keep review latency and worker lease behavior inside acceptable margins. Bigger requests look convenient to an API client, but they increase the blast radius of a bad rule pack or malformed import.
Outcome safety: A run can end as classified, needs_review, or blocked. The code-backed routing sends no-candidate cases to blocked, low confidence below 0.82 to review, and close ties within 0.05 confidence to review. That choice rejects the tempting alternative of always returning the top candidate, because customs work needs preserved doubt more than false certainty.
Audit output: Audit exports freeze tenant, product, classification, rule pack, candidates, overrides, and timestamps into a payload snapshot. Rendering from mutable rows would be easier, but it would make an old export change when a product record or rule pack changes later. The snapshot is the contract.
Deployment boundary: The shipped evidence supports a local Postgres-backed app and a production container that expects external database and secrets. The registry says not deployed, and the production compose file has no public host label, so there is no live URL to claim.

Decision Log

Decision	Alternative Rejected	Why
Immutable active rule packs	Editing active packs in place	Historical classifications and audit exports need the exact rule version that produced them. The migration trigger blocks mutation of active pack fields.
Active pack selected by tenant and jurisdiction	Newest active pack for the tenant	Customs codes depend on jurisdiction. Earlier project evidence found wrong-jurisdiction drift, so the active boundary is scoped to both tenant and jurisdiction.
Frozen audit snapshots	Re-reading current product and rule rows during export	Mutable rows would rewrite history. A frozen payload keeps the product facts, rule pack, candidates, overrides, and timestamps aligned to the classification run.
Outcome router with `blocked` and `needs_review`	Persisting every worker result as `classified`	Batch 010 exposed the risk: no-candidate, low-confidence, and tie cases can look successful unless uncertainty is stored as a first-class result.
Optional adapters disabled by default	Mandatory RAG, notification, and workflow calls	The core workflow must run offline. External services can enrich evidence, notify reviewers, or trigger workflow, but they cannot block classification and audit export.

Scaling Limits

The first scaling limit is the in-memory search index. It keeps the reviewer path simple, but it is not a shared index across app replicas. If this service moved to several app processes, search freshness would need a persisted index strategy, a rebuild plan, or a separate search service.

The second limit is Postgres-backed polling. FOR UPDATE SKIP LOCKED is a sound choice for an MVP queue because it avoids duplicate work and keeps job evidence beside classification rows. At higher volume, the worker batch size, tick interval, connection pool, and row indexes become the tuning surface. Only then would I split the queue into a dedicated system.

The third limit is runtime proof. The code has fuel and timeout contracts, but not a shipped WebAssembly engine dependency. If untrusted tenant-authored rules became part of the product promise, the runtime boundary would need stronger sandbox evidence before the architecture could claim that guarantee.

The shape is intentionally conservative. Customs software earns trust by showing what it knew, which rule pack it used, why it hesitated, and who overrode it later. The architecture keeps those questions answerable without pretending that every imported product deserves an automatic code.

Trade Compliance Classification Engine