Building AI with MongoDB: Navigating the Path From Predictive to Generative AI

Mat Keep

#GenAI#Vector Search

It should come as no surprise that the organizations unlocking the largest benefits from generative AI (gen AI) today have already been using predictive AI (a.k.a. classic, traditional, or analytical AI). McKinsey made this same observation back in June 2023 with its “Economic Potential of Generative AI1” research.

There would seem to be several reasons for this:

  1. An internal culture that is willing to experiment and explore what AI can do

  2. Access to skills — though we must emphasize that gen AI is way more reliant on developers than the data scientists driving predictive AI

  3. Availability of clean and curated data from across the organization that is ready to be fed into genAI models

This doesn’t mean to say that only those teams with prior experience in predictive AI stand to benefit from gen AI. If you take a look at examples from our Building AI case study series, you’ll see many organizations with different AI maturity levels tapping MongoDB for gen AI innovation today.

In this latest edition of the Building AI series, we feature two companies that, having built predictive AI apps, are now navigating the path to generative AI:

  1. MyGamePlan helps professional football players and coaches improve team performance.

  2. Ferret.ai helps businesses and consumers build trust by running background checks using public domain data.

In both cases, Predictive AI is central to data-driven decision-making. And now both are exploring gen AI to extend their services with new products that further deepen user engagement. The common factor for both? Their use of MongoDB Atlas and its flexibility for any AI use case.

Let's dig in.

MyGamePlan: Elevating the performance of professional football players with AI-driven insights

The use of data and analytics to improve the performance of professional athletes isn’t new. Typically, solutions are highly complex, relying on the integration of multiple data providers, resulting in high costs and slow time-to-insight. MyGamePlan is working to change that for professional football clubs and their players. (For the benefit of my U.S. colleagues, where you see “football” read “soccer.”)

MyGamePlan is used by staff and players at successful teams across Europe, including Bayer Leverkusen (current number one in the German Bundesliga), AFC Sunderland in the English Championship, CD Castellón (current number one in the third division of Spain), and Slask Wroclaw (the current number one in the Polish Ekstraklasa).

I met with Dries Deprest, CTO and co-founder at MyGamePlan who explains, “We redefine football analysis with cutting-edge analytics, AI, and a user-friendly platform that seamlessly integrates data from match events, player tracking, and video sources. Our platform automates workflows, allowing coaches and players to formulate tactics for each game, empower player development, and drive strategic excellence for the team's success.”

At the core of the MyGamePlay platform are custom, Python-based predictive AI models hosted in Amazon Sagemaker. The models analyze passages of gameplay to score the performance of individual players and their impact on the game. Performance and contribution can be tracked over time and used to compare with players on opposing teams to help formulate matchday tactics.

Data is key to making the models and predictions accurate. The company uses MongoDB Atlas as its database, storing:

  1. Metadata for each game, including matches, teams, and players.

  2. Event data from each game such as passes, tackles, fouls, and shots.

  3. Tracking telemetry that captures the position of each player on the field every 100ms.

This data is pulled from MongoDB into Python DataFrames where it is used alongside third-party data streams to train the company’s ML models. Inferences generated from specific sequences of gameplay are stored back in MongoDB Atlas for downstream analysis by coaches and players.

Figure 1:  With MyGamePlans web and mobile apps, coaching staff, and players can instantly assess gameplay and shape tactics.

On selecting MongoDB, Deprest says,

We are continuously enriching data with AI models and using it for insights and analytics. MongoDB is a great fit for this use case.

“We chose MongoDB when we started our development two years ago. Our data has complex multi-way relationships, mapping games to players to events and tracking. The best way to represent this data is with nested elements in rich document data structures. It's way more efficient for my developers to work with and for the app to process. Trying to model these relationships with foreign keys and then joining normalized tables in relational databases would be slow and inefficient.”

In terms of development, Deprest says, “We use the PyMongo driver to integrate MongoDB with our Python ML data pipelines in Sagemaker and the MongoDB Node.js driver for our React-based, client-facing web and mobile apps.”

Deprest goes on to say, "There are two key factors that differentiate MongoDB from NoSQL databases we also considered: the incredible level of developer adoption it has, meaning my team was immediately familiar and productive with it. And we can build in-app analytics directly on top of our live data, without the time and expense of having to move it out into some data warehouse or data lake. With MongoDB’s aggregation pipelines, we can process and analyze data with powerful roll-ups, transformations, and window functions to slice and dice data any way our users need it."

Moving beyond predictive AI, the MyGamePlan team is now evaluating how gen AI can further improve user experience.

Deprest says, "We have so much rich data and analytics in our platform, and we want to make it even easier for players and coaches to extract insights from it. We are experimenting with natural language processing via chat and question-answering interfaces on top of the data. Gen AI makes it easy for users to visualize and summarize the data. We are currently evaluating OpenAI’s ChatGPT LLM coupled with sophisticated approaches to prompt engineering, orchestration via Langchain, and retrieval augmented generation (RAG) using LlamaIndex and MongoDB Atlas Vector Search."

As our source data is in the MongoDB Atlas database already, unifying it with vector storage and search is a very productive and elegant solution for my developers.

Dries Deprest, CTO and Co-founder, MyGamePlan

By building on MongoDB Atlas, MyGamePlan’s team can use the breadth of functionality provided by a developer data platform to support almost any application and AI needs in the future.

Check out our AI resource page to learn more about building AI-powered apps with MongoDB.

Ferret.ai: Building trust with relationship intelligence powered by AI and MongoDB Atlas while cutting costs by 30%

Across the physical and digital world, we are all constantly building relationships with others. Those relationships can be established through peer-to-peer transactions across online marketplaces, between tradespeople and professionals with their prospective clients, between investors and founders, or in creating new personal connections. All of those relationships rely on trust to work, but building it is hard. Ferret.ai was founded to remove the guesswork from building that trust.

Ferret is an AI platform architected from the ground up to empower companies and individuals with real-time, unbiased intelligence to identify risks and embrace opportunities. Leveraging cutting-edge predictive and generative AI, hundreds of thousands of global data sources, and billions of public documents, Ferret.ai provides curated relationship intelligence and monitoring — once only available to the financial industry — making transparency the new norm.

Al Basseri, CTO at Ferret tells us how it works: "We ingest information about individuals from public sources. This includes social networks, trading records, court documents, news archives, corporate ownership, and registered business interests. This data is streamed through Kafka pipelines into our Anyscale/Ray MLops platform where we apply natural language processing through our spaCy extraction and machine learning models. All metadata from our data sources — that's close to three billion documents — along with inferences from our models are stored in MongoDB Atlas. The data in Atlas is consumed by our web and mobile customer apps and by our corporate customers through our upcoming APIs."

Figure 2:  Artificial intelligence + real-time data = Relationship Intelligence from Ferret.ai.

Moving beyond predictive AI, the company’s developers are now exploring opportunities to use gen AI in the Ferret platform. "We have a close relationship with the data science team at Nvidia,” says Basseri. “We see the opportunity to summarize the data sources and analysis we provide to help our clients better understand and engage with their contacts. Through our experimentation, the Mistral model with its mixture-of-experts ensemble seems to give us better results with less resource overhead than some of the larger and more generic large language models."

As well as managing the data from Ferret’s predictive and gen AI models, customer data and contact lists are also stored in MongoDB Atlas. Through Ferret’s continuous monitoring and scoring of public record sources, any change in an individual's status is immediately detected.

As Basseri explains, "MongoDB Atlas Triggers watch for updates to a score and instantly send an alert to consuming apps so our customers get real-time visibility into their relationship networks. It's all fully event-driven and reactive, so my developers just set it and forget it."

Basseri also described the other advantages MongoDB provides his developers:

  • Through Atlas, it’s available as a fully managed service with best practices baked in. That frees his developers and data scientists from the responsibilities of running a database so they can focus their efforts on app and AI innovation

  • MongoDB Atlas is mature, having seen it scale in many other high-growth companies

  • The availability of engineers who know MongoDB is important as the team rapidly expands

Beyond the database, Ferret is extending its use of the MongoDB Atlas platform into text search. As the company moves into Google Cloud, it is migrating from its existing Amazon OpenSearch service to Atlas Search.

Discussing the drivers for the migration, Basseri says, "Unifying both databases and search behind a single API reduces cognitive load for my developers, so they are more productive and build features faster. We eliminate all of the hassle of syncing data between database and search. Again, this frees up engineering cycles. It also means our users get a better experience because previous latency bottlenecks are gone — so as they search across contacts and content on our platform, they get the freshest results, not stale and outdated data."

By migrating from OpenSearch to Atlas Search, we also save money and get more freedom. We will reduce our total cloud costs by 30% per month just by eliminating unnecessary data duplication between the database and the search engine. And with Atlas being multi-cloud, we get the optionality to move across cloud providers as and when we need to.

Al Basseri, CTO at Ferret.ai

Once the migration is complete, Basseri and the team will begin development with Atlas Vector Search as they continue to build out the gen AI side of the Ferret platform.

What's next?

No matter where you are in your AI journey, MongoDB can help. You can get started with your AI-powered apps by registering for MongoDB Atlas and exploring the tutorials available in our AI resources center. Our teams are always ready to come and explore the art of the possible with you.

1 https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier