Today at MongoDB.live we announced the General Availability of MongoDB Atlas Data Lake, a serverless, scalable query service that allows you to natively query and analyze data across AWS S3 and MongoDB Atlas in-place.
MongoDB Atlas Data Lake was released in public beta at MongoDB World 2019 as a new member of the MongoDB Atlas family. It was built to bring MongoDB’s flexible document model, query language, and tools to the domain of Data Lakes. Over the past year, we’ve received amazing feedback, done a ton of testing, and iterated on the platform’s capabilities. Today at MongoDB.live, we’re thrilled to announce that it is Generally Available (GA) with improved performance and a range of new features – we’re confident that it is now ready for your production workloads.
Additionally, we are announcing Atlas Online Archive as a beta feature for Atlas clusters, powered by Atlas Data Lake. Online Archive enables a simple and automated way to tier your data across fully managed databases and cloud object storage and query it through a single endpoint. Learn more about the key features here.
Atlas Data Lake is a multi-tenant on-demand query processing engine that allows you to utilize the MongoDB Query Language (MQL) on cloud object store data in multiple formats including JSON, BSON, CSV, TSV, Avro, ORC, and Parquet. Data is analyzed on-demand with no infrastructure setup and no time-consuming transformations, pre-processing, or metadata management. There's no schema to pre-define, allowing you to work with your data faster.
Under the hood, Atlas Data Lake deploys multiple compute nodes to process queries against the target data. Nodes work in parallel in the region nearest the data for fast processing and to minimize data transfer. The framework for executing queries relies on map-reduce and contains many specific customizations to improve performance and reduce cost.
In addition to making the product Generally Available, we’re excited to announce the following new features:
Atlas Data Lake can now support Federated Queries with Atlas clusters as data sources in addition to cloud object store data. This means that you can combine both your live cluster data and historical cloud object store data in virtual databases and collections on Atlas Data Lake. Queries can target collections from specific clusters, collections based on cloud object data, or collections that are a union of data living in both a cluster and an object store. This can be used for analytic workloads or as a component in a data-tiering strategy.
$out to S3 and Atlas clusters gives you the power to persist the results of your aggregations to your preferred storage tier. $out to Atlas behaves just as you would expect $out in a MongoDB Atlas cluster to behave today but can utilize Atlas, S3, or both as the data source. $out to S3 on the other hand is a new feature unique to Atlas Data Lake. It allows you to write data back to AWS S3 storage, automatically optimized for future read performance on specified parameters when queried through Atlas Data Lake.
Beta support for SQL through Atlas Data Lake aims to ease the usage of the entire MongoDB ecosystem for users who need or prefer SQL. Additionally, we will be providing a JDBC driver to allow users to connect all of their preferred data visualization, data science, or machine learning tools that rely on SQL to their Atlas Data Lakes.
Atlas Data Lake continues to be a part of the MongoDB Cloud Platform so you can use the MongoDB drivers and tools you’re already familiar with and manage it through the Atlas UI. To get started, sign up for MongoDB Atlas and bring your own S3 bucket or any HTTP URL pointing at a file to run queries. You can see it in action today with an interactive demo walkthrough. I highly encourage you to explore the new features and functionality we’ve released that make it easier than ever to gain insights from your data, wherever it resides. We can’t wait to see what you’ll build with it!
Try MongoDB in the Cloud
Create a free account and launch a cluster in minutes!