BLOGAtlas Vector Search voted most loved vector database in 2024 Retool State of AI report — Read more >>

Bazaarvoice Manages One of the World's Largest Product Catalogs with MongoDB Atlas

image of a woman working on a laptop




MongoDB Atlas
Atlas Search
MongoDB Flex Consulting





The voice of the marketplace

For those of us who like to shop online, there are a number of critical things to consider before we make a purchase: the price, availability, delivery time and the experience of the customers who came before us. In fact, checking out authentic customer reviews has become so integral that many people wouldn’t dream of parting with their cash without reliable information on whether the product is as described and worth the money.

Founded in 2005, Bazaarvoice is known as the voice of the marketplace. Thousands of leading brands and retailers trust Bazaarvoice to help them drive revenue, grow, gain actionable insights, and turn customers into loyal advocates. Its products and services are designed to create a smarter shopper experience across the customer journey and to harness the power of user-generated content (UGC). These authentic ratings, reviews, and photos are displayed next to product listings to help shoppers make the right decision when considering a purchase.

“Bazaarvoice is the world’s largest independent source of authentic UGC. We work with some of the world's largest brands and retailers to connect them to their consumer feedback,” explains Eamon Scullion, Staff Software Engineer at Bazaarvoice. “By joining our syndication network, retailers and brands can deliver more content to customers, in the places that matter.”


Ingesting data on an astronomical scale

Bazaarvoice deals with data on a scale that’s difficult to comprehend. The Bazaarvoice catalog stores and distributes data about billions of client products, with hundreds of millions of updates ingested every day.

Product data is one of the driving forces of Bazaarvoice. Teams at Bazaarvoice use this data for experiences such as powering the syndication network, enabling clients to collect and share content like ratings and reviews, visual and social content, and content from sampling campaigns.

But the enterprise isn’t solely focused on maintaining the platform. It also wanted to continue delivering more customer value, move towards serverless development, and leverage managed services to lower the operational burden. This required a high-performing database that could handle an enormous volume of data without compromising functionality, such as the ability to search or index.

“In our industry, customer expectations change all the time — TikTok didn’t exist a few years ago. Now it has 834 million monthly users and we need to be able to tap into that market,” adds Scullion. “We need a data solution that is flexible, which allows us to evolve our data over time as we discover new and more challenging needs while avoiding risky and costly migrations."

To do that, Bazaarvoice went to market to find a database with a flexible schema, strong search capabilities, and the ability to scale seamlessly to handle massive volumes of data.


Sharding the MongoDB cluster to scale horizontally

Bazaarvoice is one of the world’s largest users of MongoDB Atlas. It adopted MongoDB in 2012, rolled it out as a self-hosted solution for the product catalog team in 2017, and started using MongoDB Atlas and Atlas Search in 2019. This enabled it to tap into its in-house NoSQL skill set and support company growth while improving operational efficiency, getting richer insights into database performance, and reducing management overheads. Leveraging managed services also helps the team to focus on innovation and differentiating Bazaarvoice from its competitors.

Across Bazaarvoice, MongoDB underpins activities such as curating social content, product matching across catalogs to enable content-sharing, and running sampling campaigns to collect more content.

MongoDB Atlas is also the primary data store for the catalog team, who uses Atlas to ingest and manage all of Bazaarvoice's product data, making it easily searchable for customers and other product teams through Atlas Search. However, as product data volumes grew by 1 billion documents in the past two years alone, Bazaarvoice realized it needed to optimize its MongoDB configuration or risk hitting capacity limits, such as the maximum 2.1 billion document limit for a single Lucene search index.

“We were running out of disk space and dealing with more and more traffic. When we’d exhausted all methods for vertical scaling, we engaged MongoDB Consulting to guide us through scaling horizontally by sharding our environment,” explains Scullion.

Sharding involves moving from a single replica set to many servers (or ‘shards’), balancing the data so no single server is overloaded. It also means the team can scale the exact resources needed, which is more cost- effective. In the run-up to Black Friday, for example, Bazaarvoice was scaling up multiple instance tiers for months at a time to handle the spikes in traffic.

Working together, the team designed shard key and cluster configurations, disaster recovery and backup strategies, and executed the plan with no downtime under tight timescales. To determine how data was divided, the team grouped data from each retailer on the same shard. This promotes greater efficiency by targeting queries to a single shard rather than fetching data from all of them, which would increase latency. Bazaarvoice can process workloads three times faster compared to before the sharding project.

Before sharding, the catalog team upgraded to the latest version of MongoDB Atlas to make use of the latest and greatest features that v6.0 had to offer. This included improvements to how sharded clusters balance data and new features such as refining or live resharding, which will allow the team to iterate on its approach in the future if the need arises.

“We sharded our MongoDB cluster in November 2022 in time for the Black Friday sales, which every year drives a huge surge in demand. The most impressive thing about the project was that to our clients and their customers, nothing changed, but in reality, everything had,” recalls Scullion. “Sharding is notoriously complex, but it went without a hitch.”

In addition to avoiding running out of disk space, this freed up the team to optimize and reduce their search replication lag from several hours depending on traffic to near zero. This is important because it reduces the time for customers to discover product data and enables them to add content sooner.


Unlimited scalability without impacting performance

MongoDB has become integral to how Bazaarvoice does business. In addition to scalability, Bazaarvoice can add more complex search functionality, including autocomplete, which will help customers find the right product faster and will save costs by replacing alternative search systems.

“MongoDB provides a stable store for our product data,” Scullion comments. “We can now support even greater data sets and develop more complex features.”

Since moving to MongoDB Atlas, the volume of data Bazaarvoice handles has grown by more than 1 billion documents. By sharding the production clusters, Scullion is confident the Catalog team could easily handle 10 billion documents by simply changing the schema. Where changes once took hours to reflect in search results, sharding supports real-time updates.

Both query performance and data ingestion have also improved and the team is free to focus on innovation and developing new features. “Productivity is much higher on the developer team as we now have the time to innovate. Sharding means there are fewer hardware spikes that can lead to performance bottlenecks,” Scullion adds. “We love the resources and support available from the MongoDB community. If we come across a problem, there’s always someone there who can help.”

With scalable, high-performing infrastructure, Bazaarvoice can continue to grow, streamline acquisitions, and develop next-generation services for both sellers and shoppers on the digital marketplace. The voice of the consumer is ever-changing and as technologies like augmented reality reshape the way we shop online, Bazaarvoice is well placed to protect its position as the platform of choice for UGC.

“With MongoDB, we can now support even greater data sets and develop more complex features.”

Eamon Scullion, Staff Software Engineer, Bazaarvoice

“We sharded our MongoDB cluster in November 2022 in time for the Black Friday sales, which every year drives a huge surge in demand. To our clients and their customers, nothing changed, but in reality, everything had.”

Eamon Scullion, Staff Software Engineer, Bazaarvoice

What will your story be?

MongoDB will help you find the best solution.