How a Seller Stopped Guessing at Idealo Prices and Started Winning Rank 1
The Situation
A German e-commerce seller was listing products on Idealo.de, Germany's largest comparison shopping platform. This is the same retailer whose supplier-to-store pricing pipeline I automated separately. That system handles what the store charges. This one handles how the store competes: scraping live competitor data from Idealo and recommending prices that win the buy box.
Idealo shows every seller's price side by side for the same product. The cheapest offer gets the prominent "Buy" button. Every other seller is listed below it in a smaller format.
The seller was setting prices manually in a spreadsheet, checking a handful of competitors by opening Idealo product pages in the browser, eyeballing the cheapest offer, and adjusting prices by feel. The process covered about 40 products per session. Most products were repriced once a week. Some slipped to once a month.
The result: on any given day, roughly half of the seller's listed products were priced above the cheapest competitor. For the products priced above the cheapest offer, click-through rates from Idealo dropped by an estimated 60-70%. The seller was paying Idealo's listing fees on products that nobody was clicking on.
The Cost of Doing Nothing
Manual repricing at 40 products per session, roughly 45 minutes of work each time, covered only a fraction of the catalog. Products not reviewed in a given week drifted further from competitive pricing as other sellers adjusted their prices daily.
The direct cost is small: maybe 4-5 hours per week of staff time, or around €8,000-10,000 annually. The indirect cost is larger. Every product listed above the cheapest competitor bleeds click-through rate. On Idealo, a product at rank 2 or lower converts at a fraction of the rate of the rank 1 listing, based on the seller's own analytics. For a catalog with 200+ products, the lost revenue from non-competitive pricing on even a third of them adds up to tens of thousands in missed sales per quarter.
What I Built
A pricing intelligence platform that scrapes Idealo offer pages for live competitor data, runs it through a ranking-aware optimization algorithm, and returns recommended prices via a REST API. The seller submits product IDs, the system scrapes every competing offer for those products, identifies the seller's current rank, and calculates an optimized price based on competitor positions, shop reputation, delivery speed, and margin constraints.
The system has two operating modes. The first is on-demand: the seller sends a batch of product IDs through the API (or the companion Excel add-in), and the system scrapes and optimizes in real time. The second is inventory sync: the system pulls the seller's entire Idealo catalog from the Idealo Business API's offer report endpoint, extracts all active product IDs, and scrapes every competing offer for every listed product.
The optimization algorithm was the hardest part to get right. The first version simply undercut the cheapest competitor by 5%. It worked for products where the seller was losing. But for products where the seller was already rank 1, the algorithm kept cutting the price against itself. Every run made rank-1 products cheaper for no reason. I split the algorithm into two branches: one that undercuts to gain rank 1, and one that raises the price to maximize margin while defending rank 1.
System Flow
Data Model
Architecture Layers
The Decision Log
| Decision | Alternative Rejected | Why |
|---|---|---|
| Django + Celery over FastAPI | FastAPI with BackgroundTasks | Celery provides real task isolation, retry logic, and chord/group patterns for parallel scraping. FastAPI's BackgroundTasks run in-process with no failure recovery. With 20+ products scraped in parallel, one memory leak or failed request would contaminate the entire batch. |
| BeautifulSoup scraping over Playwright | Playwright headless browser | Idealo's offer list pages return static HTML. The scraper needs to parse table rows, not execute JavaScript. Adding a headless browser would have inflated the Docker image from ~200MB to ~600MB and introduced browser process management inside Celery workers that already run at the memory limit. |
| Ranking-aware dual strategy over simple undercut | Fixed 5% undercut for all products | A flat undercut destroys margin on products where the seller already holds rank 1. The dual strategy captures the price gap between the seller and the second-cheapest competitor, recovering margin that would otherwise be left on the table. |
| Decimal arithmetic over float | Python float for price calculation | On a €2,800 product with sequential delivery and reputation adjustments, float and Decimal diverge by €0.03 per calculation. Across 200 repriced products daily, the drift compounds into visible pricing inconsistency. |
| 450MB memory kill switch in Celery | Letting the OS handle OOM | Render's free tier has limited memory. BeautifulSoup parsing and DataFrame conversion leak small amounts per task. The kill switch triggers a clean restart before the OS OOM-kills the process without cleanup. |
Results
Before the system, the seller repriced about 40 products per week using manual browser checks, covering less than 20% of the active catalog in a given cycle. Competitive pricing was checked once a week for most products, monthly for others. Roughly half of listed products were priced above the cheapest Idealo competitor at any given time. The manual process cost roughly €8,000-10,000 per year in staff time, but the larger cost was the click-through rate bleed on every non-competitive listing.
After deployment, the entire active catalog can be scraped and optimized in a single API call, with each product's offer data parsed, stored, and optimized in under 30 seconds. Based on the seller's first 6 weeks of usage, rank-1 coverage across the active catalog rose from roughly 45% to over 70%. The 4-5 hours per week of manual repricing was eliminated. The dual-strategy algorithm captures margin on rank-1 products that the old process left on the table: instead of holding at the price that happened to win, the system raises it to just below the second-cheapest competitor.
The constraint that would need attention at scale is the single-product scraping model. Each product ID triggers a fresh HTTP scrape of the Idealo offer page. At 1,000+ products, the total scraping time (even parallelized) pushes into minutes, and Idealo's rate limiting becomes the bottleneck. A persistent scraping queue with incremental updates would replace the current on-demand model.