How a Property Platform Stopped Querying Four Databases to Answer One Question
The Situation
A UK property technology company had built four separate microservices to handle different aspects of their data pipeline: one scraped Rightmove listings, one processed floorplans, one ran AI-powered image condition analysis, and one generated a universal identifier linking all three together. Each service had its own Supabase PostgreSQL database. Each worked well in isolation.
The problem showed up when the company built a RAG agent for their frontend. The agent needed to answer questions like "Show me 3-bedroom detached houses in good condition with completed floorplans." That single question required data from all four databases. The agent was making four separate API calls per query, merging the results in application code, and returning answers that were slow, inconsistent, and impossible to filter in combination.
The Cost of Doing Nothing
Every user query hit four databases. Response latency was 800ms to 1.2 seconds for a simple property lookup. Cross-service joins were handled in application code, which meant the AI agent carried the complexity of understanding four different schemas, handling missing data across services, and reconciling different data types for the same identifier. The development team spent roughly a third of their sprint cycles on data integration bugs rather than product features, equivalent to an estimated £30,000-40,000 per year in analyst and developer time lost to manual data stitching. And the agent couldn't perform full-text search across property descriptions combined with condition filters, because the data lived in separate databases with no shared index.
What I Built
A replication layer that streams data from all four source databases into a single Central PostgreSQL instance using native logical replication. On top of the replicated raw data, a dedicated rag_api schema provides two materialized views that collapse 79 raw tables into two queryable surfaces: one row per property (address, price, condition score, floorplan status), and a detail table for per-image analysis.
The first version worked for three services. Then I added the Rightmove scraper with its 58 tables and the entire replication cluster stalled. PostgreSQL was trying to spawn parallel sync workers for the initial table copy, and there were only 6 background worker slots available. Three were permanently occupied by existing subscriptions. I had to develop a "Traffic Control" protocol: disable all existing subscriptions, let the large sync complete in isolation, then re-enable everything. No data loss. The replication slots on each publisher held the WAL position during the downtime.
The other surprise was type inconsistency. Four services, four different opinions on how to store the super_id identifier. UUID in two services, text in one, varchar(1028) in another. The materialized view joins silently produced zero matches without explicit ::uuid casts. No errors, no warnings. Every row just showed NULL for condition data, which looked exactly like "analysis hasn't run yet." I caught it by comparing row counts manually.
System Flow
Data Model
Architecture Layers
The Decision Log
| Decision | Alternative Rejected | Why |
|---|---|---|
| PostgreSQL logical replication | Debezium CDC, custom ETL scripts | Source and target are both PostgreSQL on Supabase. Logical replication is engine-native, requires zero middleware, and streams changes without a message broker. |
| Materialized views for the RAG layer | Live views, direct table queries | The agent needs indexed, pre-joined reads. A live view re-executing a 4-way LEFT JOIN with LATERAL subqueries would add 200-400ms per call. Materialized views serve single-digit millisecond reads. |
| Selective table replication for Image Condition | Full schema replication | The service uses Django's public schema. Replicating everything would pull in 30+ framework tables that are irrelevant and could cause namespace collisions with future services. |
| "Traffic Control" protocol for large syncs | Upgrading instance tier | Disabling 3 subscriptions freed worker slots for the 58-table initial copy. Zero cost increase. Zero data loss. Documented as a repeatable procedure for future high-volume onboarding. |
| Supabase Small instance over Micro | Micro with tuned parameters | Micro caps max_logical_replication_workers at 4. With exactly 4 services, there's zero headroom for retries or temporary sync workers. Small instance supports max_worker_processes = 20. |
Results
Before the system, the RAG agent queried four separate databases per user question, with response latencies between 800ms and 1.2 seconds. Cross-service data integration was handled in application code, consuming roughly a third of the development team's sprint capacity for maintenance and bug fixes.
After deployment, the agent queries a single materialized view with indexed reads returning in under 10ms. Property data from all four services is pre-joined and refreshed every 10 minutes via pg_cron. Full-text search on property descriptions runs against a GIN index, which wasn't possible when the data was split across databases. The entire replication infrastructure was built and deployed in one week. The replication topology itself is the part that scales without redesign: adding a fifth or sixth source service means adding one more subscription and extending the materialized view SQL. The constraint is the view refresh. Beyond 50,000 properties, the concurrent refresh will need partitioning or an incremental strategy to keep the 10-minute cycle from drifting.