INTRODUCTION
Informing over 70 million monthly shoppers
Few organizations process as much data as idealo, Europe’s largest price-comparison service. idealo consolidates the latest offers from 50,000 merchants, including billions of offers from Amazon and eBay alone, to inform over 70 million monthly shoppers of the most affordable places to buy products. The process of getting up-to-date offers from merchants is, on the surface, quite simple – mass updates sent every hour to idealo via large CSV or XML feeds. However, the sheer volume, regularity and need for both speed and accuracy makes for a massively intensive data exercise.
The first step for the developer team in charge of the store offer is to check which incoming updates are relevant for idealo. All the data then flows to the next components, which are responsible for applying any discounts to the base prices of the offers. Then everything must be stored in the offer store and provided to the downstream consumers at idealo to present what specifically has changed. With 60 fields on each offer and regular updates, data moves rapidly. The growing popularity of idealo, with traffic increasing six times in four years, meant idealo was running the biggest on-premises MongoDB database in Europe.

THE CHALLENGE
Reaching the limits
Processing 160,000 requests per second on an on-premises system warranted a 21-shot system with 72 CPU cores per node for the offer store alone. With traffic growth expected to continue at an approximate rate of 20% per year, idealo was fast approaching the limits of its on-premises infrastructure.
“Importing and filtering the data from the shops is complex,” said Jens Lippmann, a senior software developer working as part of the team responsible for idealo’s offer store. “There are also huge databases involved in this process to keep fingerprints and timestamps because we get different feeds with different generation times and delays, and we have to keep everything in sync.
“That's quite challenging. With billions of offers to keep up to date and traffic growing quickly, it became clear we would not be able to grow any further without expanding our data center. All the existing cages were filled with hardware and scaling the self-hosted MongoDB environments to meet demand was taking several months. To reduce the cost of acquiring and running hardware, and the maintenance and administrative burden on staff, we turned to the cloud.”
THE SOLUTION
Rapid read and write
While idealo was already a MongoDB customer, it studied alternative options for a managed cloud service. However, MongoDB’s multi-cloud database, MongoDB Atlas, proved the most suitable for idealo’s central use case: getting data out, updating it, and writing it back as fast as possible.
“We have other teams using advanced features like AI and machine learning, but the most important thing for us is the raw power to read and write data extremely fast,” said Lippmann. “We don’t have a huge number of queries, maybe 20,000 to 40,000 per second, but the work behind every single one is enormous. With MongoDB Atlas we can read thousands of documents in a single batch, perform a bulk update operation, and write all the updates back in one operation.”
idealo set MongoDB a target of completing the cloud migration before Black Friday last year, which was successfully achieved. The company is now in an optimization phase to ensure that it is using the new infrastructure as efficiently as possible, as well as other areas such as security.
OUTCOME
Greater flexibility to innovate
idealo is now 100% in the cloud. It has also decommissioned its data center due to the success of the cloud migration, which has significantly reduced the resources required to operate idealo’s service. The number of shards needed has dropped from 25 to 12, while the size of each node has reduced by two-thirds. idealo can support up to 200,000 queries per second and 60,000 updates per second. It has also supported over 150,000 queries per second for 14 consecutive hours.
Aside from the core benefit of being able to run the same number of offers with improved performance and lower cost, Lippmann’s team of developers are enjoying greater flexibility to adjust and optimize systems and innovate new features at speed.
“It is completely different from the situation before in the data center where we had to apply for new databases and then wait for weeks or months until it got provided to us. We can now innovate in minutes rather than weeks or months,” Lippmann said. “Even if we want to just change different parameters on our clusters, it’s easy to use and we have never had a situation where something didn't perform as expected. From a technical perspective it is a very good setup.”
Lippmann concluded: “The training was excellent and the support engineers allocated to our team are great. We are in close contact with a success manager who knows our use cases and usage patterns for the various databases, so can help us best. We are really happy with the relationship and support. It is fast and professional, and we can rely on it. The biggest compliment I can give is that it simply works. It’s a great feeling for us as developers when we can rely on support when we need it.”

