Benjamin Flast

6 results

Introducing Atlas Vector Search: Build Intelligent Applications with Semantic Search and AI Over Any Type of Data

This post is also available in: Deutsch , Français , 中文 We’re excited to announce Atlas Vector Search now in General Availability. Vector Search now supports production workloads allowing you to continue to build intelligent applications powered by semantic search and generative AI, while optimizing resource consumption and improving performance with Search Nodes. Read the blog below for the full announcement and list of benefits. The moment has finally come. Artificial Intelligence has shifted left. What was once built and often trapped inside enterprise-wide data science and machine learning teams is now readily available to builders everywhere. But to harness the incredible power of these new tools, you need to build on top of a reliable, composable, and elegant data platform. At the same time, as we’ve all seen, these new capabilities are only as good as the data or “ground truth” they have access to. That’s why we’re thrilled to be adding yet another capability to the MongoDB Atlas Developer Data Platform to unlock the full potential of your data and power AI Applications. Today, MongoDB is thrilled to announce our exciting new Vector Search capability designed to meet the demands of data in all forms and allow our partners to harness these incredible new capabilities. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. What is the capability? For those of you unfamiliar, Vector Search is a capability that allows you to query your data based on semantics or the meaning of the data rather than the data itself. This is made possible by being able to represent any form of data numerically as a Vector which can then be compared to one another through sophisticated algorithms. The first step is to take source data whether it be text, audio, image, or video, and convert them into “Vectors” or “Embeddings” using an “encoding model.” With recent advances in Artificial Intelligence, these vectors are now better able to capture the meaning of data by projecting lower dimensional data into a higher dimensional space which contains more context about the data. Once this data has been transformed into these numeric representations you can query to find similar values using an Approximate Nearest Neighbors algorithm which allows your queries to very quickly find data with similar vectors. This enables you to satisfy queries like “give me movies with the feeling of sorrow” or “give me images that look like…”. This capability unlocks a whole new class of capabilities. How does it relate to our platform? With this functionality natively built into MongoDB Atlas you don’t need to copy and transform your data, learn some new stack and syntax, or manage a whole new set of infrastructure. With MongoDB’s Atlas Vector Search none of this is necessary, you can utilize these powerful new capabilities all within a world class and battle tested platform to build applications faster than ever before. Many of the challenges inherent in harnessing AI and Vector Search stem from the complexity involved in safely and securely exposing your application data. These tasks add layers of friction to the developer experience and make your applications harder to build, debug, and maintain. MongoDB erases all of these challenges while bringing the power of Vector Search to a platform that organically scales vertically and horizontally to support any workload you throw at it. Finally, none of this matters without guarantees around security and availability, and MongoDB’s commitment to a secure data management solution along with high availability through redundancy and automatic failover ensure that your application will never miss a beat. New at MongoDB.local London As of .Local London we’re excited to announce the introduction of a dedicated Vector Search aggregation stage that can be invoked via $vectorSearch. This new aggregation stage introduces a few new concepts that add new power and make it easier than ever to utilize Vector Search. With $vectorSearch you can also utilize a pre-filter with MQL syntax (e.g. $gte, $eq, etc…) which filters out documents as you traverse the index leading to consistent results and high performance. Any developer who understands MongoDB will be able to take advantage of this filtering capability with ease! Finally we’re also introducing two ways to tune your results inside the aggregation stage, both a “numCandidates” and “limit” parameter, with these parameters you can tune how many documents should be candidates for the approximate nearest neighbor search and then limit how many results you want with the “limit.” How does it interact with the ecosystem? The amount of innovation happening around Artificial Intelligence is astounding, and it’s amazing to see the advances the Open Source community is quickly making. There are huge gains being made in open source Language Models as well as the various methods they can be integrated into applications. With raw power exposed by Artificial Intelligence, it’s never been more important to have a solid abstraction over the capability to give developers the flexibility they need. With this in mind we are thrilled to share that we have several capabilities supported in LangChain and LlamaIndex, from Vector Search support all the way to Chat Logging and document indexing. We’re moving fast here and will continue to release new functionality for the premier providers. Wrap up With all of this said, things are just getting started, we’re committed at MongoDB to helping developers power the next generation of AI-enabled Applications with the best Developer Data Platform in the market. We’re also going to be looking into more frameworks and plugin architectures that we can support. But as always, the most important part of this equation is you, the developer. We’re going to be talking to the community and finding the ways we can serve you best and ensure we’re meeting your needs every step of the way. Go forth and build! To learn more about Atlas Vector Search and whether it would be the right solution for you, check out our documentation , whitepaper , and tutorials or get started today . Head to the MongoDB.local hub to see where we'll be showing up next.

June 22, 2023

Announcing Atlas Data Federation and Atlas Data Lake

Two years ago, we released the first iteration of Atlas Data Lake . Since then, we’ve helped customers combine data from various storage layers to feed downstream systems. But after years spent studying our customers’ experiences, we realized we hadn’t gone far enough. To truly unleash the genius in all our developers, we needed to add an economical cloud object storage solution with a rich MQL query experience to the world of Atlas. Today, we’re thrilled to announce that our new Atlas Data Federation and Atlas Data Lake offerings do just that. We now offer two complementary services, Atlas Data Federation (our existing query service formerly known as Atlas Data Lake) and our new and improved Atlas Data Lake (a fully managed analytic-oriented storage service). Together, these services (both in preview) provide flexible and versatile options for querying and transforming data across storage services, as well as a MongoDB-native analytic storage solution. With these tools, you can query across multiple clusters, move data into self managed cloud object storage for consumption by downstream services, query a workload-isolated inexpensive copy of cluster data, compare your cluster data across different points in time, and much, much more. In hearing from our customers about their experiences with Atlas Data Lake, we learned where they have struggled, as well as the features they’ve been looking for us to provide. With this in mind, we decided to shift the name of our current query federation service to Atlas Data Federation to better align with how customers see the service and are getting value. We’ve seen many customers benefit from the flexibility of a federated query engine service, including querying data across multiple clusters, databases, and collections, as well as exporting data to third-party systems. We also saw where our customers were struggling with data lakes. We heard them ask for a fully managed storage solution so they could achieve all of their analytic goals within Atlas. Specifically, customers wanted scalable storage that would provide high query performance at a low cost. Our new Data Lake provides a high-performance analytic object storage solution, allowing customers to query historical data with no additional formatting or maintenance work needed on their end. How it works Atlas Data Federation encompasses our existing Data Lake functionality with several new affordances. It continues to deliver the same power that it always has, with increased performance and efficiency. The new Atlas Data Lake will now allow you to create Data Lake pipelines (based on your Atlas Cluster backup schedules) and fields on which you can optimize queries. The service takes the following steps: On the selected schedule, a copy of your collection will be extracted from your Atlas backup with no impact to your cluster. During extraction, we build partition indexes based on the contents of your documents and the fields you’ve selected for optimization. These indexes allow your queries to be optimized by capturing the minimums and maximums (and other stats) of the records in each partition, letting you quickly find the relevant data for your queries. Finally, the underlying data lands in an analytic-oriented format inside of cloud object storage. This minimizes data scanned when you execute a query. Once a pipeline has run and a Data Lake dataset has been created, you can select it as a data source in our new Data Federation query experience. You can either set it as the source for a specific virtual collection in a Federated Database Instance or you can have your Federated Database Instance generate a collection name for each dataset that your pipeline has created. Amazingly, no part of this process will consume compute resources from your cluster — neither the export nor the querying of datasets. These datasets provide workload isolation and consistency for long-running analytic queries, a target for ETL jobs using the powerful $out to S3. This makes it easy to compare the state of your data over time. Advanced though this is, it’s only the beginning of the story. We’re committing to evolving the service, improving performance, adding more sources of data, and building new features. All of this will be based on the feedback you, the user, gives us. We can’t wait to see how you’ll use this powerful new tool and can’t wait to hear what you’d like to see next. Try Atlas Data Lake Today

June 7, 2022

Turning MongoDB into a Predictive Database

Note: this blog, originally published November 10, 2021, has been updated with new installation and connection instructions to connect your MongoDB instance to MindsDB’s machine learning platform, and with new examples and use cases There’s a growing interest in artificial intelligence (AI) and machine learning (ML) in the business world. The predictive capabilities of ML/AI enable rapid insights from patterns detected at rates faster than manual analysis. Additionally, recent advances in generative machine learning applications, such as OpenAI and Hugging Face, offer powerful tools for businesses to generate and analyze text data. Businesses realize that this can lead to increased profits, reduced costs, and accelerated innovation. Although businesses both large and small can benefit from the power of AI, implementing a machine learning project can be both complex and time-consuming. MongoDB , Inc. (NASDAQ: MDB), the leading, modern general purpose database platform, and MindsDB , the open-source machine learning platform that brings automated machine learning to the database, established a technology partnership to advance machine learning innovation. This collaboration aims to make it easy for developers to incorporate powerful ML-driven features into their applications to solve real-world business challenges. What is the best approach? Once you have identified the initial ML projects you’d like to focus on, such as forecasting or text analysis, choosing the right tools and methodologies can help speed up the time it takes to build, train, optimize and deploy models. Model selection and feature engineering can be time consuming and difficult if you aren’t aware of the specific dimensions the ML model is going to train on. Additionally, pipelines used for data extraction and transformation need to be maintained over time, and a machine learning model also needs to be deployed on the right compute framework. Existing state-of-the-art AutoML frameworks provide methods to optimize performance including adjusting hyper parameters (such as the learning rate or batch size). The MindsDB AutoML framework extends beyond most conventional automated systems of hyper parameter tuning and enables novel upstream automation of data cleaning, data pre-processing, and feature engineering. To empower users with transparent development, the framework encompasses explainability tools, enables processing for complex data types (NLP, time series, language modeling, and anomaly detection), and gives users customizability by allowing imported models of their choice. MindsDB also generates predictions at the data layer (without consuming DB resources)—an additional, significant advancement that accelerates development speed. Generating predictions directly in MongoDB Atlas with MindsDB AI Collections gives you the ability to consume predictions as regular data, query these predictions, and accelerate development speed by simplifying deployment work-flows. Getting started with MindsDB We suggest starting with either MindsDB in AWS or http://cloud.mindsdb.com for a demo cloud version of MindsDB. For anything beyond small scale testing (2 models, a few thousand documents) we strongly suggest using MindsDB Pro (easy to set up, simple, usage-based ‘pay as you go pricing’). Check out the product page on AWS Marketplace for instructions on setting up MindsDB in your existing AWS account. For all documentation and FAQs, please visit https://docs.mindsdb.com/ . Setting up the connection to MindsDB in MongoDB Currently, integration works by accessing MongoDB through MindsDB’s MongoDB API as a new data source. More information about connecting to MongoDB can be found here . MindsDB hosts a demo MongoDB database with sample data sets. Use the MongoDB Shell or MongoDB Compass UX to connect to MindsDB’s MongoDB API. Please note that you must have MongoDB shell version ≥3.6 to use the MindsDB MongoDB API. MongoDB Compass connection To connect to MindsDB Demo database use the following connection string (as below in the MongoDB Compass UX): mongodb+srv://admin:201287aA@cluster0.myfdu.mongodb.net/admin?authSource=admin&replicaSet=atlas-5koz1i-shard-0&readPreference=primary&appname=MongoDB%20Compass&ssl=true If you would prefer to follow along with this tutorial from your own database, feel free to use your own connection string, and upload an example dataset where you can run a number of test cases house_sales.csv. If you use your own MongoDB instance, you will need to follow two additional steps: Step 1: Once you have created a MindsDB acc’t then connect your MongoDB instance to MindsDB (cloud or AWS) using your own connection string in the MindsDB editor (Here is the link for MindsDB Cloud Editor: https://cloud.mindsdb.com/editor ) Run the query below in the MindsDB editor: db.databases.insertOne({ name: "mongo_int", engine: "mongodb", connection_args: { "port": 27017, "host": "mongodb+srv://admin:@localhost", "database": "test_data" } }); On execution, we get: { "acknowledged" : true, "insertedId" : ObjectId("62dff63c6cc2fa93e1d7f12c") } Where: Step 2: Connect your MongoDB Compass or Shell to your MongoDB and; create a new Collection, and add the .csv file, as below: Create collection > Add data > Select data types Data types: [Date, Number, String, Number] Now, we have successfully integrated with the MongoDB database. The next step is to use MongoDB-client to connect to MindsDB’s MongoDB API and train models. MindsDB has a number of prepared demo use cases and data sets, including predicting home rental prices, forecasting quarterly house sales and predicting customer sentiment through language analysis of product review text using our Hugging Face integration. Many examples for Mongo, with code, can be found in the links below: https://docs.mindsdb.com/using-mongo-api/nlp https://docs.mindsdb.com/using-mongo-api/classification https://docs.mindsdb.com/using-mongo-api/regression https://docs.mindsdb.com/nlp/sentiment-analysis-inside-mongodb-with-openai https://docs.mindsdb.com/nlp/question-answering-inside-mongodb-with-openai https://docs.mindsdb.com/nlp/text-summarization-inside-mongodb-with-openai https://docs.mindsdb.com/nlp/json-from-text#example-in-mql For a powerful, showcase example, we will demonstrate a unique feature that is recently available using MindsDB’s integration with OpenAI’s GPT-3 language model. MindsDB can be used to generate JSON documents from unstructured text in the DB. For example, as below, MindsDB can create JSON documents with relevant information on properties for rent (days on market, number of bathrooms, price, rating) based on natural language descriptions from real-estate listings. Please follow the guide above, or check out our docs on how to connect MongoDB Compass and MongoDB Shell to MindsDB. To create this model in MQL, run the below command from MongoDB Compass or MongoDB Shell: db.models.insertOne({ name: 'nlp_model', predict: 'json', training_options: { engine: 'openai', input_text: 'sentence', json_struct: { 'rental_price': 'rental price', 'location': 'location', 'nob': 'number of bathrooms' } } }) We pass the same three parameters here. The engine parameter ensures we use the OpenAI engine. The json_struct parameter stores a predefined JSON structure used for the output. The input_text parameter contains the name of the field that stores input text. Now we can query the model, passing the input text stored in the sentence field. db.nlp_model.find({ 'sentence': 'Amazing 3 bedroom apartment located at the heart of Manhattan, has one full bathrooms and one toilet room for just 3000 a month.' }) On execution, we get: { json: { rental_price: '3000', location: 'Manhattan', nob: '1' }, sentence: 'Amazing 3 bedroom apartment located at the heart of Manhattan, has one full bathrooms and one toilet room for just 3000 a month.' This tutorial highlights the steps to create an NLP model to generate JSON output from unstructured text inside MongoDB by leveraging MindsDB’s MongoDB connector and automation capabilities. Using the existing compute configuration, the example above took less than five minutes, without the need for extensive tooling, or pipelines in addition to your database. With MindsDB’s machine learning capabilities inside MongoDB, developers can now build machine learning models at reduced cost, gain greater insight into model accuracy, and help users make better data-based decisions. Modernize with MongoDB and MindsDB MongoDB provides an intuitive process for data management and exploration by simplifying and enriching data. MindsDB helps turn data into intelligent insights by simplifying modernization into machine learning, AI, and the ongoing spectrum of data science. Try MindsDB to connect to MongoDB, train models, and run predictions in the cloud! Simply install MindsDB from Amazon Marketplace and our team is available on Slack and Github for feedback and support. Check it out and feel free to ask questions, share use case examples!

November 10, 2021

MongoDB Atlas Online Archive for Data Tiering is now GA

We’re thrilled to announce that MongoDB Atlas Online Archive is now Generally Available. Online Archive allows you to seamlessly tier your data across Atlas clusters and fully managed cloud object stores, while retaining the ability to query it through a single endpoint. Reduce storage costs. Set the perfect price to performance ratio on your data. Automate data tiering. Eliminate the need to manually migrate or delete valuable data. Queryable archives. Easily federate queries across live and archival data using a unified connection string. With Online Archive, you can bring new use cases to MongoDB Atlas that were previously cost-prohibitive such as high volume time-series workloads, data archival for auditing purposes, historical log keeping and more. Manage your entire data lifecycle on MongoDB Atlas without replicating or migrating it across multiple systems. What is Atlas Online Archive? Online Archive is a fully managed data tiering solution that allows you to tier data across your "hot" database storage layer and "colder" cloud object storage to maintain queryability while optimizing on cost and performance. Online Archive is a good fit for many different use cases, including: Insert heavy workloads, where data is immutable and has lower performance requirements as it ages Historical log keeping and time-series datasets Storing valuable data that would have otherwise been deleted using TTL indexes We’ve received amazing feedback from the community over the past few months while the feature was in preview and we’re now confident in supporting your production workloads. Our users have put the feature through a variety of use cases in production and development workloads which has enabled us to make a wide range of improvements. Online Archive gives me the flexibility to store all of my data without incurring high costs, and feel safe that I won't lose it. It's the perfect solution. Ran Landau, CTO, Splitit Autonomous Archival Management It's easy to get started with Online Archive and it requires no ongoing maintenance once it’s been set up. In order to activate the feature, you can follow these simple steps: Navigate to the “Online Archive” tab on your cluster card and begin the setup flow. Set an archiving rule by selecting a date field, with dot-notation if it’s nested, or creating a custom filter. Choose commonly queried fields that you want your archival queries to be optimized for, with a few things in mind: Your data will always be “partitioned” by the date field in your archive, but can be partitioned by up to two additional fields as well. The fields that you most commonly query should be towards the top of the list (date can be moved to the top or bottom). Query fields should be chosen carefully as they cannot be changed after the fact and will have a large impact on query performance. Avoid choosing a field that has unique values as it will have negative performance impacts for queries that need to scan lots of data. And you’re done! MongoDB Atlas will automatically move data off of your cluster and into a more cost-effective storage layer that can still be queried with a single connection string that combines cluster and archive data, powered by Atlas Data Lake . What's Next? Along with announcing Online Archive as Generally Available, we’re excited to share a few additional product enhancements which should be available in the coming months: Private Link support when querying your archive Incremental deletes of data from your archive Support for BYO key encryption on your archival data Improved performance and stability Try Atlas Online Archive Online Archive allows you to right-size your Atlas clusters by storing hot data that is regularly accessed in live storage and moving colder data to a cheaper storage tier. Billing for this feature will include the cost to store data in our fully managed cloud object storage and usage based pricing for querying archive data. We can’t wait to see what new workloads you’ll bring onto MongoDB Atlas with the new flexibility provided by Online Archive! To get started, sign up for an Atlas account and deploy any dedicated cluster (M10 or higher). Have questions? Check out the documentation or head over to our community forums to get answers from fellow developers. And if we’re missing a feature you’d like to see, please let us know ! Safe Harbor Statement The development, release, and timing of any features or functionality described for MongoDB products remains at MongoDB's sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality. Except as required by law, we undertake no obligation to update any forward-looking statements to reflect events or circumstances after the date of such statements.

November 30, 2020