Rangespan

Consumers demand choice, but delivering all possible options at all times is a costly proposition for traditional retailers. In order to compete with online merchants, the likes of Amazon.com, retailers need to deliver a wider array of products at a reasonable cost. Launched in 2011 by a team of ex-Amazon.com senior execs and engineers, Rangespan provides an automated supply chain service that enables retailers to massively expand range without risk, complexity or cost.

Instead of taking the traditional route of supplier aggregators who license and resell multiple catalogues or deliver incomplete catalogue data, Rangespan leveraged MongoDB to create a comprehensive multi-supplier catalogue. The company used proprietary algorithms on a combination of supplier and non-supplier data to deliver the broadest category of products with the greatest attributes at the lowest price.

The Problem

Traditional retailers and their supply chains have difficulty meeting the scope of products delivered by online retailers. The unique properties of proprietary supplier data make it challenging for retailers to present a single authoritative view of their product catalog when integrating multiple suppliers. Firstly, each supplier provides different attributes for the same product – e.g. any combination of images, barcodes, string-based and numeric attributes, etc. Plus, it is difficult to aggregate product data from dozens of suppliers carrying a subset of overlapping products at different price points.

Rangespan considered developing an e-catalogue on a relational database; however, this would adversely impact time to market, code complexity and the addition of new partners. A relational database solution would require data scientists to normalize data involving millions of objects (SKUs) on an ongoing basis with the addition of each new supplier. Hundreds of sparsely populated tables would result in inefficient use of storage, higher costs and slower response times when executing complex joins. Having fought this battle at Amazon, the Rangespan team was determined to build a non-tabular system that would save them money and accelerate time to market.

Demand variability, retail seasonality and the rapid increase in objects stored necessitated a database that could deliver consistently fast response times – a problem that plagues many monolithic databases – even as it scaled horizontally in the cloud.

The Solution

MongoDB powers Rangespan’s business. Rangespan began work with MongoDB in early 2011 and within six months deployed their first customer, a Top 10 retailer in the UK. After evaluating various SQL and NoSQL options – including Cassandra and Couch – Rangespan chose MongoDB for its dynamic schema design, horizontal scalability, high performance and active community support. MongoDB’s dynamic schema design allows them to marry disparate structured and unstructured data from multiple suppliers into a single authoritative data store. Plus,

Rangespan’s expertise in Python development and Python-based natural language processing (NLP) tools allowed developers to program in their preferred language and tools without having to learn new skills. MongoDB removes object relational mapping complexity and streamlines the development process while increasing Rangespan’s agility to create and maintain an online catalogue with less manual intervention.

Thanks to MongoDB, Rangespan serves as the trusted partner and supplier to multiple retailers, who wouldn’t otherwise be able to achieve economies of scale and scope. “Everyone in big retail wants to beat out Amazon,” said Rangespan Technical Director James Summerfield. “Being able to offer retailers the breadth of selection and the best catalogue data is a tremendous advantage.”

Agile Catalogue Creation

With supplier data, MongoDB is a “vacuum for sucking up data;” Rangespan can pull data feeds from suppliers and from the web without having to pre-define products, taxonomy and other attributes. Rangespan can leave the data unstructured and determine how to use it later. MongoDB’s document structure allows Rangespan to persist the corpus of largely unstructured data – such as data for products which may have more than 100 varying attributes – then use NLP tools to cleanse, format and append to the same document. As Rangespan adds new suppliers and runs more analytics jobs over time, their catalogue becomes more comprehensive.

For retailers, MongoDB’s document model facilitates on-demand data transformation, enabling Rangespan to work with multiple retailers without having to constantly adapt their data store. MongoDB makes data easily consumable by retailers and offers Rangespan endless ways to derive data from the catalogue. The flexibility of the MongoDB-based architecture enables Rangespan to grow their business by onboarding in one week the number of suppliers added by traditional retailers in one year.

High Scalability and Availability

Rangespan currently stores 400 million product records in MongoDB – a 100% increase over the course of 12 months. MongoDB delivers predictably fast performance as they add new suppliers weekly and continue to scale the business.

In order to minimize the impact of this for their customers, as well as meet the exponentially higher seasonal loads during the holidays, Rangespan deployed MongoDB replica sets. According to Summerfield, adding replication with MongoDB to prepare for increased order volume over Christmas was easier than anticipated: “a simple, brain dead exercise.” MongoDB delivered the redundancy and reliability on Large EC2 instances (a pair of replica sets and arbiter) without having to set up an expensive data center architecture. In one case of lost database connectivity, Rangespan was reassured by MongoDB’s automatic hand-off between the primary and secondary members of the replica sets that took place within seconds.

Deep Data Analytics with Map Reduce

Being a business partner means more than simply supplying catalogue data. Thanks to MongoDB, Rangespan is able to go the extra distance, providing analytics to help retailers find the optimal price for existing products and identify new products of interest, thereby increasing retailer efficiency for expanding offerings.

Rangespan runs 10-15 Map Reduce jobs a day on MongoDB to analyze catalogue meta data. “It’s like Lego for adults,” said Summerfield, who can slice and dice the data to view products by manufacturer or category, for example. Additionally, Rangespan can leverage Hadoop Map Reduce to tease out unstructured data, such as competitive pricing culled from a web spider, or product data scraped from a supplier’s site.

Reduced Complexity

Leveraging a document database using JSON data structure decreases machine and human touch points, reducing system complexity and time to market. In a relational database, hundreds of tables would have to be created, joined and updated, which would DBA time and investment.

While MongoDB effectively enables Rangespan to persist and mine structured and unstructured data, ElasticSearch makes deep and complex content – especially unstructured data that can be difficult to query – easy to find. “It took longer for the data to sync than for us to integrate,” said Summerfield. “The proof of concept was 10 lines of code.”

Results

“If we didn’t have Mongo… instead of operating in single digit margins we’d have to charge more and staff the team with information architects and DBAs. We’d also have an inflated catalogue,” said Summerfield. “Too often companies have ambition that exceeds their time and technology wherewithal. MongoDB helped us accomplish everything we hoped to achieve over the past years.”

This includes:

Accelerated Time to Value

  • Built application and deployed first customer within 6 months
  • Can now on-board in less than a month the number of suppliers added by traditional retailers in one year

Reduced Complexity

  • Developing application on SQL database would have resulted in 90% empty columns and required complex joins
  • Defining products with only relevant information in a single document saved space, time and cost

Reduced Costs

  • Saved 12 FTE months for time that would have been spent normalizing catalogue product data and taxonomy for multiple retailers
  • Continued savings of over $400K annually with MongoDB: less storage requirements on EBS; elimination of storing large amounts of empty data; elimination of inefficient/non-value-add use of staff time