Architectural Brief: Sensor Telemetry Engine
A solar farm with 200 panels generates a reading every second per panel per metric. Voltage, temperature, power. That's 600 messages per second from one site. Add five sites and you're at 3,000 messages per second before a single line of business logic runs. The constraint that shaped every decision in this system: sustain 5,000 readings per second through a single binary, with anomaly detection running inline, without dropping messages.
System Topology
Infrastructure Decisions
-
Language: Rust (stable, 2024 edition). Chose over Go and Python because at 5,000 messages/second, garbage collection pauses in Go and Python's GIL introduce latency spikes that compound with batch flush timing. Rust's zero-cost async via Tokio keeps the NATS consumer and HTTP server on the same runtime without contention. The binary runs in a 512MB container.
-
Message Broker: NATS 2.x. Chose over Kafka because Kafka's partition model and ZooKeeper dependency are overkill for a single-node deployment. NATS gives sub-millisecond pub/sub, automatic reconnection, and fits in a 256MB container. JetStream is available if persistence is needed later, but the current design inserts before acknowledging, so message loss means the batch failed, not that NATS dropped it.
-
Time-Series Database: TimescaleDB 2.x on PostgreSQL 16. Chose over InfluxDB because InfluxDB means a proprietary query language (Flux/InfluxQL) and losing JOIN capability for alert rule lookups. TimescaleDB provides continuous aggregates (automatic 1-minute and 1-hour rollups), retention policies, and standard SQL. The rest of the stack already speaks PostgreSQL.
-
HTTP Framework: Axum 0.8. Chose over Actix-web because Axum's extractor model makes authentication composable. The
AuthenticatedTenantextractor resolves API keys on every protected handler without a global middleware layer. Tower's service model gives built-in CORS and body size limiting. -
API Key Auth with DashMap + argon2. Chose over JWT because machine-to-machine IoT traffic doesn't need token refresh or expiry management. The TenantResolver caches resolved keys in a DashMap with 5-minute per-entry TTL. argon2 verification only runs on cache miss, so the hot path is an O(1) hash map lookup, not a password hash computation.
-
Batch INSERT with UNNEST arrays. Chose over per-row inserts because one round-trip to TimescaleDB per 100 readings instead of 100 round-trips. The BatchBuffer swaps the internal Vec before releasing the Mutex, so the INSERT runs without holding the lock.
Constraints That Shaped the Design
-
Input: JSON messages published to NATS subjects matching
sensors.>. Each message carries a device UUID, metric name, value, unit, optional RFC 3339 timestamp, and a tenant API key. Server receive time fills in when the timestamp is missing. -
Output: REST API with 18 endpoints serving device management, time-series queries with automatic resolution selection, alert lifecycle management, and Prometheus metrics. Alert events are published to NATS on
alerts.{tenant_id}.{device_id}.{metric}and optionally forwarded to the Notification Hub and Workflow Engine. -
Scale Handled: 5,000 readings/second sustained. At 50,000 readings/second, the single-writer batch buffer would need partitioning by tenant or device type, and the Mutex-based buffer would need a lock-free ring buffer. The 20-connection PostgreSQL pool would need bumping, and NATS would benefit from JetStream for backpressure.
-
Hard Constraints: No foreign key constraints on the readings hypertable. TimescaleDB performs better without FK enforcement at 5,000 inserts/second. Continuous aggregates refresh on a 1-minute and 1-hour schedule. Deviation detection requires at least 5 aggregate samples in the window before it fires, preventing false positives during cold start.
Decision Log
Decisions not covered above in Infrastructure Decisions:
| Decision | Alternative Rejected | Why |
|---|---|---|
| No FK on hypertable | Foreign keys on readings | TimescaleDB hypertable INSERT throughput drops significantly with FK enforcement. Device existence is verified at ingestion time via auto-registration instead. |
| Two-phase tenant resolution | Full table scan with argon2 | First try prefix-based lookup (1 row), then fall back to scanning up to 50 legacy rows. Prevents unbounded argon2 iteration on every request. |
| Insert-before-acknowledge for NATS | Acknowledge-then-insert | If the batch INSERT fails after acknowledgment, the readings are lost. Inserting first means a failed batch is retried by NATS redelivery. |
| 15-minute default cooldown window | Per-reading alert suppression | A sensor stuck at a high value would generate 120 alerts/minute without cooldown. Database-backed cooldown survives process restarts, unlike in-memory TTL caches. |
| Auto-resolution for time-range queries | Client-selected resolution | Raw readings for ranges under 1 hour, minute aggregates for 1-24 hours, hour aggregates beyond. Prevents clients from accidentally scanning months of raw data. |