A few years ago, RJ Jain moved to San Francisco and wanted to buy a couch for his new apartment. Shortly after making an online purchase for a new couch, he found a listing for the exact same couch, previously owned, on another site for half the retail price.
That's when RJ had his “ah-ha” moment. “If I bought this used item, I would have saved so much money. Plus, buying the used couch would have been responsible shopping—much better for the environment, he explains.
And with that, the idea of Price.com was born.
Price.com is building a platform that helps users save time and maximize savings when purchasing products online. On the platform, users can compare prices across product conditions (e.g. new, used, refurbished, rental) and leverage coupons, price alerts, and a cash-back rewards program.
Price.com has grown quickly - the platform showcases over one billion product listings across 2,000 retail partnerships, and is experiencing a 30% user growth month-over-month. The company has raised funding from Founders Fund; Social Capital; and angels including former execs at Twitter, Priceline, Microsoft, and Pinterest.
For this episode of #BuiltWithMongoDB, we spoke with Vasco Morais, Director of Engineering at Price.com about the company’s tech and his experiences using the platform (for the first time!).
Siya Raj Purohit: Your team provides so many cool options for shoppers. How does Price.com function on the back end?
Vasco Morais: Price.com’s proprietary algorithm and deep learning models make it possible for both structured and unstructured data to be matched, allowing for quick product matching and discovery to occur across several product types. This enables fun product features - for example, users just have to take a picture of a product they want to buy, and Price.com tells them the best place to buy it.
To help provide this seamless service, we ingest and process data around the clock, using a sophisticated data pipeline. To prevent bad data and pricing errors from retailers from making it into our database, we have established a standard schema and put in a lot of effort (around the clock!) into ensuring everything adheres to the standard.
SRP: How did the team decide to have Price.com #BuiltWithMongoDB?
VM: From the beginning, the team knew that down the line, we would want to provide full support for all listings, including geospatial queries (which MongoDB has native support for). We also wanted to have the ability to easily create new indices as new functionality was added. That way, we could continuously query any product in our database and simultaneously update new data into our system without having to overcome read/write conflicts.
We also wanted to have a platform that would scale with us. We’re processing billions of listings and price points and hosting on MongoDB gives us confidence. Finally, several team members had experience with MongoDB and felt close to MongoDB’s architecture — so it was an easy choice.
SRP: When you joined Price.com as Director of Engineering, it was your first time using MongoDB. How was the onboarding process for you?
VM: I had previously only worked with relational databases which opt for longer query construction as a trade-off for easy syntax and arguments. For example, doing something as simple as sorting (filtering) by timestamp can easily turn into a multi-line query in SQL, and it’s nice to see how simple it remains in MongoDB. Similarly, setting up a new collection in MongoDB was instantaneous compared to setting up and defining a schema for a new table in relational databases.
I first looked at MongoDB documentation the night before I started at Price.com and felt fine working on the platform the next day. Every now and then, I would run into something that I would have to resolve with a Google search, but it definitely didn’t feel like the arduous use-it-or-lose-it skill set that accompanies other databases. Overall, for me and my team, MongoDB significantly cuts down the amount of time we spend on development when compared to other databases.
Part 1: The Modernization Journey with Exafluence and MongoDB
Welcome to the first in a series of conversations between Exafluence and MongoDB about how our partnership can use open source tools and the application of data, artificial intelligence/machine learning and neuro-linguistic programming to power your business’s digital transformation. In this installment, MongoDB Senior Partner Solutions Architect Paresh Saraf and Director for WW Partner Presales Prasad Pillalamarri sit down with Exafluence CEO Ravikiran Dharmavaram and exf Insights Co-Founder Richard Robins to discuss how to start the journey to build resilient, agile, and quick-to-market applications. From Prasad Pillalamari: I first met Richard Robins, MD & Co-Founder of exf Insights at Exafluence back in June 2016 at a MongoDB world event. Their approach towards building data-driven applications was fascinating for me. Since then Exafluence has grown by leaps and bounds in the System Integration space and MongoDB has outperformed its peers in the database market. So Paresh and I decided to interview Richard to deep-dive into their perspective on Modernization with MongoDB. Prasad & Paresh: We first met the Exafluence team in 2016. Since then, MongoDB has created the Atlas cloud data platform that now supports multi-cloud clusters and Exafluence has executed multiple projects on mainframe and legacy modernization. Could you share your perspective on the growth aspects and synergies of both companies from a modernization point of view? Richard Robins: Paresh and Prasad, I’m delighted to share our views with you. We’ve always focused on what happens after you successfully offload read traffic from mainframes and legacy RDBMS to the cloud. That’s digital transformation and legacy app modernization. Early on, Exafluence made a bet that if the development community embraces something we should, too. That’s how we locked in on MongoDB when we formed our company. Having earned our stripes in the legacy data world, we knew that getting clients to MongoDB would mean mining the often poorly documented IP contained in the legacy code. That code is often where long-retired subject matter expert (SME) knowledge resides. To capture it, we built tools to scan COBOL/DB2 and stored procedures to reverse engineer the current state. This helps us move clients to a modern cloud native application, and it's an effective way to merge, migrate, and retire the legacy data stores all of our clients contend with. Once we’d mined the IP with those tools we needed to provide forward-engineered transformation rules to reach the new MongoDB Atlas endpoint. Using a metadata driven approach, we built a rules catalog that included a full audit and REST API to keep data governance programs and catalogs up to date as an additional benefit of our modernization efforts. We’ve curated these tools as exf Insights , and we bring them to each modernization project. Essentially, we applied NLP, ML, and AI to data transformation to improve modernization analysts’ efficiency, and added a low-to-no code transformation rule builder, complete with version control and rollback capabilities. All this has resulted in our clients getting world-class, resilient capabilities at a lower cost in less time. We’re delighted to say that our modernization projects have been successful by following simple tenets — to embrace what the development community embraces and to offer as much help as possible — embodied in the accelerator tools we’ve built. That’s why we are so confident we'll continue our rapid growth. P&P: How do you think re-architecting legacy applications with MongoDB as the core data layer will add value to your business? RR: We believe that MongoDB Atlas will continue to be the developers go-to document database, and that we’ll see our business grow 200-300% over the next three years. With MongoDB Atlas and Realm we can provide clients with resilient, agile applications that scale, are easily upgraded, and are able to run on any cloud as well as the popular mobile iOS and Android devices. Digital transformation is key to remaining competitive and being agile going forward. With MongoDB Atlas, we can give our clients the same capabilities we all take for granted on our mobile apps: they’re resilient, easy to upgrade, usually real-time, scale via Kubernetes clusters, and can be rolled back quickly if necessary. Most importantly, they save our clients money and can be automatically deployed. P&P: At a high level, how will Exafluence help customers take this journey? RR: We’re unusual as a services firm in that we spend 20% of gross revenue on R&D, so our platform and approach are proven. Thus, relatively small teams for our healthcare, financial services, and industrial 4.0 clients can leverage our approach, platform, and tools to deliver advanced analytical systems that combine structured and unstructured data across multiple domains. We built our exf Insights accelerator platform using MongoDB and designed it for interoperability, too. On projects we often encounter legacy ETL and messaging tools. To show how easy it is, we recently integrated exf Insights with SAP HANA and the SAP Data Intelligence platform. Further, we can publish JSON code blocks and provide Python code for integration into ETL platforms like Informatica and Talend. Our approach is to reverse engineer by mining IP from legacy data estates and then forward engineer the target data estate, using these steps and tools: Reverse Engineer Extract stored procedures, business logic, and technical data from the legacy estate and load it into our platform. Use our AI/ML/NLP algorithms to analyse business transformation logic and metadata, with outliers identified for cleansing. Provide DB scans to assess legacy data quality to cleanse and correct outliers, and provide tools to compare DB level data reconciliations. Forward Engineer To produce a clean set of metadata and business transformation logic, and baseline with version control, we: Extract, transform, and load metadata to the target state. Score metadata via NLP and ML to recommend matches to the Analyst who accepts/rejects or overrides recommendations. Analysts can then add additional transformations which are catalogued. Deploy and load cleansed data to the target state platform so any transformations and gold copies may be built. Automate Data Governance via Rest API, Code Block generation (Python/JSON) to provide enterprise catalogs with the latest transforms. P&P: What are your keys to a successful transformation journey? RR: Over the past several years we’ve identified these elements and observations: Subject matter experts and technologists must work together to provide new solutions. There’s a shortage of skilled technologists able to write, deploy, and securely manage next generation solutions. Using accelerators and transferring skills are vital to mitigating the skills shortage. Existing IP that’s buried in legacy applications must be understood and mined in order for a modernization program to succeed. A data-driven approach that combines reverse and forward engineering speeds migration and also provides new data governance and data science catalog capabilities. The building, caring, and feeding of new, open source-enabled applications is markedly different from the way monolithic legacy applications were built. The document model enables analytics and interoperability. Cybersecurity and data consumption patterns must be articulated and be part of the process, not afterthoughts. Even with aggressive transformation plans, new technology must co-exist with legacy applications for some time; progress works best if it’s not a big bang. Success requires business and technology to learn new ways to provide, acquire, and build agile solutions. P&P: Can you talk about solutions you have which will accelerate the modernization journey for the customers? RR: exf Insights helps our clients visualize what’s possible with extensive, pre-built, modular solutions for health care, financial services, and industrial 4.0. They show the power of MongoDB Atlas and also the power of speed layers using Spark and Confluent Kafka. These solutions are readily adaptable to client requirements and reduce the risk and time required to provide secure, production-ready applications. Source data loading. Analyze and integrate raw structured and unstructured data, including support for reference and transactional data. Metadata scan. Match data using AI/NLP, scoring results and providing side-by-side comparison. Source alignment. Use ML to check underlying data and score results for analysts, and leverage that learning to accelerate future changes. Codeless transformation. Empower data SMEs to build the logic with a multiple-sources-to-target approach and transform rules which support code value lookups and complex Boolean logic. Includes versioned gold copies of any data type (e.g., reference, transaction, client, product, etc.). Deployment. Deploy for scheduled or event-driven repeatability and dynamically populate Snowflake or other repositories. Generates code blocks that are usable in your estate or REST API. We used the same 5-step workflow data scientists use when we enabled business analysts to accelerate the retirement of internal data stores to build and deploy the COVID-19 self-checking app in three weeks, including active directory integration and downloadable apps. We will be offering a Realm COVID-19 screening app on web, Android, and IOS to the entire MongoDB Atlas community in addition to our own clients. The accelerator integrates key data governance tools, including exf Insights repository management of all sources and targets with versioned lineage; as-built transformation rules for internal and client implementations; and a business glossary integrated into metadata repositories. P&P: Usually one of the key challenges for businesses is data being locked in silos. RR: We couldn’t agree more. Our data modernization projects routinely integrate with source transactional systems that were never built to work together. We provide scanning tools to understand disparate data as well as ways to ingest, align, and stitch them together. Using health care as an example, exf Insights provides a comprehensive analytical capability, able to integrate data from hospitals, claims, pharmaceutical companies, patients, and providers. Some of this is NonSQL, such as radiological images; for pharma companies we provide capabilities to support clinical research organizations (CROs) via a follow-the-molecule approach. Of course, we also have to work with and subscribe to Centers for Medicare & Medicaid Services (CMS) guidelines. Our data migration focuses on collecting the IP behind the data and making the source, logic, and any transformations rules available to our clients. In financial services, it’s critical to understand source and targets. No matter how data is accessed (federated or direct store), with Spark and Kafka we can talk to just about any data repository. P&P: Once we discover the data to be migrated, we need to model the data according to MongoDB’s data model paradigm. That requires multiple transformations before data is loaded to MongoDB. Can you explain more about how your accelerators help here? RR: By understanding data consumption and then looking at existing data structures, we seek to simplify and then apply the capabilities of MongoDB’s document model. It’s not unlike what a data architect would do in the relational world, but with MongoDB Atlas it’s easier. We ourselves use MongoDB for our exf Insights platform to align, transform, and make data ready for consumption in new applications. We’re able to provide full rules lineage and audit trail, and even support rollback. For the real-time speed layer we use Spark and Kafka as well. This data-driven modernization approach also turns data governance into an active consumer of the rules catalog, so exf Insights works well for regulated industries. P&P: It’s great that we have data migrated now. Consider a scenario where it’s a mainframe application and we have lots of COBOL code in there. It has to be moved to a new programming language like Python, with a change in the data access layer to point to MongoDB. Do you have accelerators which can facilitate the application migration? If so, how? RR: Yes, we do have accelerators that understand the COBOL syntax to create JSON and ultimately Java, which speeds modernization. We also found we had to reverse engineer stored procedures as part of our client engagements for Exadata migration. P&P: Once we migrate the data from legacy databases to MongoDB, validation is the key step. As this is a heterogeneous migration it can be challenging. How can Exafluence add value here? RR: We’ve built custom accelerators that migrate data from the RDBMS world to MongoDB, and offer data comparisons as clients go from development to testing to production, documenting all data transformations along the way. P&P: Now that we’ve talked about all your tools which can help in the modernization journey, can you tell us about how you already helped your customers to achieve this? RR: Certainly. We’ve already outlined how we’ve created solution starters for modernization, with sample solutions as accelerators. But that’s not enough; our key tenet for successful modernization projects is pairing SMEs and developers. That’s what enables our joint client and Exafluence teams to understand the business, key regulations, and technical standards. Our data-driven focus lets us understand the data regardless of industry vertical. We’ve successfully used exf Insights now in financial services, healthcare, and industry 4.0. Whether it’s understanding the nuances of financial instruments and data sources for reference and transactional data, or Medical Device IoT sensors in healthcare, or shop floor IoT and PLC data for predictive analytics and digital twin modeling, a data-driven approach reduces modernization risks and costs. Below are some of the possibilities this data-driven approach has delivered for our healthcare clients using MongoDB Atlas. By aggregating provider, membership, claims, pharma, and EHR clinical data, we offer robust reporting that: Transforms health care data from its raw form into actionable insights that improve member care quality, health outcomes, and satisfaction Provides FHIR support Surfaces trends and patterns in claims, membership, and provider data Lets users access, visualize, and analyze data from different sources Tracks provider performance and identifies operational inefficiencies P&P: Thank you, Richard! Keep an eye out for upcoming conversations in our series with Exafluence, where we'll be talking about agility in infrastructure and data as well as interoperability. MongoDB and Modernization To learn more about MongoDB's overall Modernization strategy, read here .
4 Steps to Success: From Surviving with Legacy Systems to Thriving with MongoDB
Legacy data migrations imply a change in the status quo. More often than not, when an organization finally undertakes a thorough analysis of its technology landscape, it arrives at the same decision: to do nothing. It is an understandably daunting task to upgrade or replace 20+ year-old applications and their database counterparts. But there are good reasons, beyond the tri-annual hardware upgrade, to propel those legacy monoliths of the 1990s into the 21st century. Companies that prevailed—and even triumphed—in the volatile spring of 2020 were those that transitioned to a more flexible usage model and were therefore able to adjust their business models more rapidly and reliably. MongoDB’s client, Sanoma, was one of the winners. Sanoma was able to scale from 3,000 to 150,000 users within 24 hours, without any service interruption. Innovation and modernization go hand in hand. However, while modernization can sadly occur without innovation, the opposite is simply not possible. A bit of history The concept of bringing data together through online data layers (ODL) or operational data stores (ODS) isn't new or specific to MongoDB. Accessing legacy systems, bringing data together, and making it all more easily accessible was a common goal even 20 years ago, and led to the search for the golden source of truth (i.e. the definitive master source for any given entity). This search proved elusive early on due to the hurdles involved with bringing data from diverse, over-structured relational constructs to a sole target called Operational Data Store (ODS) or Online Data Layer (ODL). The industry’s first attempts began with Object-oriented databases, then with the dead end of XML data stores. (In my personal opinion, Xquery and Xpath were never meant for real developers). After both endeavors failed, then came the wave of Apache efforts I like to call “Hadoop Solves the Planet,” in which companies dumped all their structured data onto a big-data treasure trove. Unfortunately, this resulted in a data desert rather than the data lake everybody was hoping for, since organizations then had to scramble to build a concept for secondary indexing, data dictionaries, and more, on top of having to rebuild the sensible structures they lost. In the 2010s, the document model, in conjunction with JSON notation , emerged as the new de facto standard. MongoDB release 3.x introduced the combination of ACID (atomicity, consistency, isolation, durability) and compliance with a broad range of data types (in BSON, for those in the know). Soon, the MongoDB team started implementing additional features of relational heritage: secondary indexing, ACID transactions, aggregations and manipulations of data in site, materialized views, joins, unions... the list goes on. Where we are now MongoDB documents can be enriched through different means and channels without touching the content — the consistency of all data and data lineage is implicitly guaranteed. A typical example is the extraction of a delivery address through a supply chain application and a billing address through an enterprise resource planning system. In many cases, those two systems have different requirements. MongoDB documents simply keep both instantiations intact and can even hold multiples of each attached to one single client profile without the need to complete loads and transformations, foreign keys, and all the other ingredients of the relational past. MongoDB simply adds and leverages other sources without destroying their context. MongoDB delivers an ODS and ODL experience while streamlining the time-consuming journey of replacing legacy application code.The data platform of true modernization and innovation has arrived! How your company can get here The entire journey can be summarized in four simple steps: Analysis: Where do I start my data journey to drive the fastest value? Scaffolding: How do I get my data out of the existing platform and bridge it to the new platform? Coding: How do I enter the world of adjusting and adapting my applications landscape? Innovation: Which are the easiest targets for my company to start achieving true innovation? The following sections answer these four questions and provide you with a starting point for your journey toward a new and improved solution landscape. Step 1: Analysis of your existing solution landscape Data Provisioning Data provisioning—the act of bringing data from source system(s) to target system—is actually the easy part of this step. Opinions may vary as to the very best approach, but most existing models for streaming data in real time make the process elegant and allow for a business-driven decision from real-time replication on one end to communicate with the batch of .CSV files on the other end. Application onboarding More exciting is the application onboarding phase, inclusive of the selection and design of initial data domains. Here, simple mechanisms derived from the classic priority concepts can assist—and yes, they existed long before computers. Data domains already exist in objects in the business logic represented through their objects in the various programming languages. But even the most talented application developer deals with constant changes which leads to compromises in those objects and can obfuscate the original clarity in their design so the objects may hide in plain sight. Unearthing those gems and aligning them to the ODS is the most important step towards true legacy modernization. The most simple solution is actually the most practical one: load an object with the existing software and persist it into a MongoDB collection. The effort of persisting the object results in two lines of code that can be easily added. The location of the two lines of code (first line one opens connection to database; second line one persists the object) does not matter as long as it is in a place after the object is built out. This is the first time you will see the beauty of MongoDB and MQL at work. You really have to do nothing for the object itself—e.g. no decomposition or abstraction layer. MongoDB takes care of it for you. When looking at the object in the MongoDB database, e.g. using MongoDB Compass, you will realize that it already looks a lot like the domain object you wanted. The actual task to map objects to domains, or subset of domains, is now mostly driven by the application use case. Tip: How to leverage application mapping to accelerate onboarding In the model below, which was taken from the financial industry but can easily be adopted across industries, we identify the data domains in various applications and map their behavior to the effort it takes to locate them as well as their importance to the app. First, each domain gets a rating for its object complexity, where “complexity” is defined by the implementation team. This is similar to the concept of “ poker ” in a development sprint. Second, each data domain must be located in the application content. Then, it’s tally time. As we can see in the example above, the concept of schedules looks quite easy but is superseded by the client profiles which have a touch more application context (spoiler: those always come out on top). Based on the combination of complexity and the number of data domains affecting an application, we can now easily achieve the model below. Agile is your friend and, assuming a certain “point capacity,” the applications fall into place for their conversion schedule in a quite neutral fashion. The development team will then start with low hanging fruit. As soon as application 1, 6 and 7 are ported, we’re in business in a new modern landscape. Along the journey, the domains will get cleaned up naturally as we do not have the static corsage of the RDBMS table designs. Step 2: Scaffolding Scaffolding is the art of building a bridge that can hold people as they cross it, then immediately dissipate once they step off. But for that critical time, it needs to hold. The same is true for the connectivity between a legacy system and a new data platform. Starting with the first sprint, we have data residing in the MongoDB data platform. If the data is limited to new applications and resides exclusively in MongoDB, nothing needs to be done. However, as shown in the client profiles example above, there may be dependencies to consider. The synchronization between the legacy database and the new MongoDB platform can be easily arranged using microservices and the same concepts used for the initial loading of data. Synchronization can also be achieved through “the gate” if only READ data is needed during the first sprint, or if you’re already dealing with WRITE and the requirement to synchronize those writes back to a legacy system. Streaming: A streaming based solution is a great option for uni-directional operations that allow read only in the most simple way. Service: Selecting a simple, tiny microservice is a good option for the use case where data needs to be selectively written. It works using the document model on the MongoDB side, but can still push necessary updates back to the legacy system, and vice-versa. The great news is that this service potentially exists already, as it requires nothing more than using the old database interface from the legacy application on one side and the new, easy-to-digest JSON document format on the MongoDB side. If both databases are ACID-compliant, any transaction is automatically treated as a normal application interaction on both sides. “Y-Loader”: Another option is a true “Y-loader,” where all transactions are written in sync to both databases in parallel, and the actual transaction is only considered committed when both systems report their commit and completion. Simple two-phase protocols (write to both, wait five seconds, read both to validate and, if in sync, commit to application) are available as ready-made services through various distributed transaction coordinators, but often it’s easier to use the existing data access in the application. In that case, the new data path to MongoDB is in parallel, and a simple redundant checkpoint (which the application logic would have had for the legacy path anyway) is expanded for this purpose. Step 3: Coding The coding with the new domain data model, as well as the MongoDB flexible document model as the underlying base, will immediately impact the coding for the business logic and application development. The operative word is immediately. As the data gets unlocked with the initial persistence of the code object to the MongoDB collection, the developer is simultaneously able to code based on business requirements. Developers will no longer be hindered by reference and requirements of object mappers. As the objects are represented through the MongoDB idiomatic drivers, each programming object resides directly in the data collection; in reverse, any changes to the business logic object will be naturally represented—code-free—in the MongoDB collection. A single blog post can't resolve all open questions and edge cases. Each application, client, and data interface is unique. Databases possess historic technical debt and implicit assumptions that become lost in generations of developers over time. “Do not touch this section—not sure what it does but last time we tried all hell broke loose…” is often-heard advice around the organizational water cooler. But the key lesson? There are many different templates available and very simple methods of quickly taking the lead to significant success. For example, a German client, who was stuck in a combination of IBM DB2 (mainframe and distributed) with a significant Hadoop footprint, was amazed when they realized they could “lift” their data one microservice at a time. This resulted in business requirements shifting from “impossible to do” for some requested queries to “completed in under one second” within a single week of a proof-of-concept. This is no exception. Cases and changes like these are made daily, reinforcing Mark Twain’s sage advice that “The secret of getting ahead is getting started." Step 4: Innovation As the migration from the legacy environment continues, innovation will be the new focus. The unlocking of previously siloed data allows immediate coupling of real-time data with machine learning platforms for various purposes: e.g. scoring for financial decision-making, personalization for retail, or optimization of production processes in the IOT context. New applications and solutions can easily be created on top of the unleashed data, even with various programming languages, direct real-time dashboards created with MongoDB Charts, and different paradigms (again, MongoDB’s idiomatic drivers do magic!) At this time, the discussion with the product owners in your squads and tribes (trying to be real modern here) begins with the question“What is the highest priority component to change?” and “What function is required to enable this change?” Is it worth waiting much longer? The real question is: why did we all not start sooner? It’s time to begin integrating the list of features you always dreamed of having, but never dared to pursue. The MongoDB team is here to help you get started. Reach out today and let’s discuss the best path forward. To learn more about modernizing to MongoDB, click here .