How a Property Platform Stopped Querying Four Databases to Answer One Question

The Situation

Centralized Property Intelligence Hub is a property data layer for product and AI teams that need one searchable record across listings, floorplans, image condition scores, and universal property IDs. It serves property platforms when frontend search or RAG answers need combined filters that no single service database can answer alone.

A UK property technology company had built four separate microservices to handle different aspects of its data pipeline: one scraped Rightmove listings, one processed floorplans, one ran AI-powered image condition analysis, and one generated a universal identifier linking all three together. Each service had its own Supabase PostgreSQL database. Each worked well in isolation.

The problem showed up when the company built a RAG agent for their frontend. The agent needed to answer questions like "Show me 3-bedroom detached houses in good condition with completed floorplans." That single question required data from all four databases. The agent was making four separate API calls per query, merging the results in application code, and returning answers that were slow, inconsistent, and impossible to filter in combination.

The Cost of Doing Nothing

Every user query hit four databases. Response latency was 800ms to 1.2 seconds for a simple property lookup. Cross-service joins were handled in application code, which meant the AI agent carried the complexity of understanding four different schemas, handling missing data across services, and reconciling different data types for the same identifier. The development team spent roughly a third of their sprint cycles on data integration bugs rather than product features, equivalent to an estimated £30,000-40,000 per year in analyst and developer time lost to manual data stitching. And the agent couldn't perform full-text search across property descriptions combined with condition filters, because the data lived in separate databases with no shared index.

What I Built

A replication layer that streams data from all four source databases into a single Central PostgreSQL instance using native logical replication. On top of the replicated raw data, a dedicated rag_api schema provides two materialized views that collapse 79 raw tables into two queryable surfaces: one row per property (address, price, condition score, floorplan status), and a detail table for per-image analysis.

The first version worked for three services. Then I added the Rightmove scraper with its 58 tables and the entire replication cluster stalled. PostgreSQL was trying to spawn parallel sync workers for the initial table copy, and there were only 6 background worker slots available. Three were permanently occupied by existing subscriptions. I had to develop a "Traffic Control" protocol: disable all existing subscriptions, let the large sync complete in isolation, then re-enable everything. No data loss. The replication slots on each publisher held the WAL position during the downtime.

The other surprise was type inconsistency. Four services, four different opinions on how to store the super_id identifier. UUID in two services, text in one, varchar(1028) in another. The materialized view joins silently produced zero matches without explicit ::uuid casts. No errors, no warnings. Every row just showed NULL for condition data, which looked exactly like "analysis hasn't run yet." I caught it by comparing row counts manually.

System Flow

Architecture diagramScroll on small screens

Data Model

Architecture diagramScroll on small screens

Architecture Layers

Architecture diagramScroll on small screens

The Decision Log

Decision	Alternative Rejected	Why
PostgreSQL logical replication	Debezium CDC, custom ETL scripts	Source and target are both PostgreSQL on Supabase. Logical replication is engine-native, requires zero middleware, and streams changes without a message broker.
Materialized views for the RAG layer	Live views, direct table queries	The agent needs indexed, pre-joined reads. A live view re-executing a 4-way LEFT JOIN with LATERAL subqueries would add 200-400ms per call. Materialized views serve single-digit millisecond reads.
Selective table replication for Image Condition	Full schema replication	The service uses Django's `public` schema. Replicating everything would pull in 30+ framework tables that are irrelevant and could cause namespace collisions with future services.
"Traffic Control" protocol for large syncs	Upgrading instance tier	Disabling 3 subscriptions freed worker slots for the 58-table initial copy. Zero cost increase. Zero data loss. Documented as a repeatable procedure for future high-volume onboarding.
Supabase Small instance over Micro	Micro with tuned parameters	Micro caps `max_logical_replication_workers` at 4. With exactly 4 services, there's zero headroom for retries or temporary sync workers. Small instance supports `max_worker_processes = 20`.

Results

Before the system, the RAG agent queried four separate databases per user question, with response latencies between 800ms and 1.2 seconds. Cross-service data integration was handled in application code, consuming roughly a third of the development team's sprint capacity for maintenance and bug fixes.

After deployment, the agent queries a single materialized view with indexed reads returning in under 10ms. Property data from all four services is pre-joined and refreshed every 10 minutes via pg_cron. Full-text search on property descriptions runs against a GIN index, which wasn't possible when the data was split across databases. The entire replication infrastructure was built and deployed in one week. The replication topology itself is the part that scales without redesign: adding a fifth or sixth source service means adding one more subscription and extending the materialized view SQL. The constraint is the view refresh. Beyond 50,000 properties, the concurrent refresh will need partitioning or an incremental strategy to keep the 10-minute cycle from drifting.

Centralized Property Intelligence Hub

How a Property Platform Stopped Querying Four Databases to Answer One Question

The Situation

The Cost of Doing Nothing

What I Built

System Flow

Data Model

Architecture Layers

The Decision Log

Results

Put this system in context.

Contents