Architectural Brief: Centralized Property Intelligence Hub
Four microservices, four separate Supabase databases, one AI agent that needed to query all of them at once. The application scraped UK property listings, analyzed floorplans, scored images for condition defects, and assigned unique identifiers across all sources. Each service worked fine in isolation. But the RAG agent powering the frontend couldn't join data across four databases without making four separate API calls per question, and those calls were slow, inconsistent, and impossible to filter in combination.
This is the replication layer that consolidated everything into a single queryable surface.
System Topology
Infrastructure Decisions
- Replication Method: PostgreSQL Native Logical Replication (Publication/Subscription). Chose over CDC tools like Debezium or custom API-based sync because the source and target are both PostgreSQL on Supabase. Logical replication is built into the engine, requires no middleware, and streams changes in near-real-time. Debezium would have added a Kafka cluster and connector management for a problem that PostgreSQL solves natively.
- Data Layer: Supabase (managed PostgreSQL) for all five databases. Chose over self-hosted RDS because the client's existing microservices already ran on Supabase. Adding a fifth Supabase instance for the Central DB kept the operational tooling consistent. The original Micro instance (4 worker slots) couldn't run 4 subscriptions and handle initial syncs simultaneously. I hit that wall on the first large onboarding and had to upgrade to Small (2GB RAM, 2 vCPU,
max_worker_processes = 20). - Consumption Layer: Materialized views in a dedicated
rag_apischema. Chose over live views or direct table queries because the RAG agent needs pre-joined data with consistent read performance. Materialized views give indexed reads against a snapshot. Live views would re-execute the 4-way join on every query. The two views (unified_property_master,rag_property_images) collapse 79 raw tables into two queryable surfaces. - Refresh Strategy:
pg_cronrunningREFRESH MATERIALIZED VIEW CONCURRENTLYevery 10 minutes. Chose over webhook-triggered refreshes because the source services don't emit events on data change. Concurrent refresh means the views remain readable during the rebuild. Therag_api.refresh_data()function wraps both view refreshes in a single call for manual triggers. - Schema Isolation: Each source service replicates into its own schema on the Central DB (
super_id,floorplan,rightmove). Chose over dumping everything intopublicbecause the Image Condition service already usespublic(Django default), and collisions would be inevitable. The one service that does usepublicrequired selective table replication to avoid pulling in Django framework tables (auth_*,django_*).
Constraints That Shaped the Design
- Input: Four PostgreSQL databases on separate Supabase instances, each with its own schema conventions. The Rightmove scraper alone has 58 tables. The Image Condition service shares the
publicschema with Django internals. Super ID provides thesuper_idUUID that links all four services. - Output: Two materialized views.
unified_property_masterproduces one row per property with address, price, condition score, floorplan status.rag_property_imagesproduces multiple rows per property with per-image condition analysis. Both are indexed, including a GIN index for full-text search on property descriptions. - Scale Handled: ~5,000 properties in the unified view, linked from ~13,000 raw scraped records. The gap exists because properties without a
super_idassignment are excluded by design. At 50,000 properties, the materialized view refresh would need partitioning or incremental refresh strategies. - Hard Constraints:
max_worker_processes = 20on the Central DB. Each active subscription consumes one background worker slot permanently. Initial table copies for large schemas (58 tables) spawn temporary sync workers that compete for the same pool. The Micro instance tier (4 worker limit) was physically incapable of running 4 subscriptions. IPv6 connectivity to Supabase source databases was unreachable from local development; all admin operations required the Session Mode Pooler (IPv4).
Decision Log
| Decision | Alternative Rejected | Why |
|---|---|---|
| PostgreSQL Logical Replication | Debezium CDC, custom ETL scripts | Replication lag measured at <1 second for single-row changes. Debezium would have required a Kafka cluster (minimum 3 brokers for production), Schema Registry, and Kafka Connect workers. Four additional services to manage for a data sync problem between 5 PostgreSQL instances that the database engine solves natively. |
| Materialized Views over live views | Direct table queries, real-time views | Benchmarked both: the 4-way LEFT JOIN with LATERAL subqueries takes 280-400ms as a live view. The materialized view serves the same data in 2-8ms with a GIN index on property descriptions enabling full-text search. The 10-minute refresh lag is acceptable because property listings don't change faster than that. |
| Selective table replication for Image Condition | Full schema replication | Publishing all tables from the Image Condition service would have replicated 30+ Django framework tables (auth_user, django_session, django_content_type) alongside the 15 application tables. Each replicated table consumes a background worker slot during initial sync. The 30 framework tables would have tripled the worker load for zero data value. |
| "Traffic Control" protocol for large initial syncs | Increasing instance size, parallel sync | The Rightmove subscription (58 tables) exhausted all 6 worker slots during initial copy. Disabling 3 existing subscriptions freed their worker slots instantly. The replication slots on each publisher held WAL position during the 40-minute downtime. Zero data loss, zero cost increase, repeatable for any future high-volume onboarding. |
| Central DB First ordering for DDL changes | Source DB First, or automated migration sync | Tested it the wrong way first: a developer added a column on the Rightmove source. Replication crashed immediately, not on the next data change but on the next WAL message. The column count mismatch kills the apply worker with no graceful fallback. "Central DB First" is enforced by documentation, not automation. A schema drift detector (comparing information_schema.columns across publishers and subscriber) would catch this before it breaks, but doesn't exist yet. |
| Small instance over Micro | Micro with manual worker management | Worker slot math: 4 permanent subscription workers + temporary sync workers during initial copy + autovacuum workers + pg_cron worker. Micro's limit of 4 means zero headroom. One subscription retry during a transient network issue would stall because no worker slot is available. Small at 20 leaves 16 slots for everything else. The cost difference is $10/month. |