Data Lakes Explained
FAQs
A data lake is a schema-on-read central repository that supports diverse big data formats, and stores all types of data at scale. Modern data lakes support data analytics and machine learning.
Key characteristics of a data lake are its scalable storage, schema-on-read (i.e., Extract-Load-Transform), support for various data formats, and ease of extracting data for analytics.
The main layers of data lake architecture are data ingestion, storage, processing, analytics, data governance, and security.
Data lakes store raw data with flexible schema-on-read for potential future use; warehouses store structured, cleaned, transformed data for already defined use cases.
Get started with Atlas today
Get started in seconds. Our free clusters come with 512 MB of storage so you can play around with sample data and get oriented with our platform.
GET STARTED WITH:
- 125+ regions worldwide
- Sample data sets
- Always-on authentication
- End-to-end encryption
- Command line tools