Supported Aggregation Pipeline Stages and Operators
This page describes the MongoDB aggregation pipeline stages and operators that Atlas Data Lake supports.
By default, Atlas Data Lake does not return documents in any specific order
for queries on Data Lakes for S3 data stores. Atlas Data Lake reads the
partitions concurrently and the underlying storage response order
determines which documents Atlas Data Lake returns first, unless you define
order using $sort
in your query. For example, if you run
the same findOne()
query twice, you could see different documents,
and if you use $skip
, different documents might be skipped
if $sort
is not used in the query.
Supported and Unsupported Aggregation Pipeline Stages
Atlas Data Lake supports all the aggregation pipeline stages except the following:
For the following stages in Atlas Data Lake queries, Atlas Data Lake introduces an alternate syntax, includes a caveat, or deviates from server. See the Description column for details.
Pipeline Stage | Description | ||||||||
---|---|---|---|---|---|---|---|---|---|
Outputs documents in order of nearest to farthest from a
specified point. Atlas Data Lake supports See Querying Data in Your Atlas Cluster for more information. | |||||||||
Performs a recursive search on a collection. Atlas Data Lake supports
See Querying Data in Your Atlas Cluster for more information. | |||||||||
Groups input documents by the specified Example The following is not supported:
| |||||||||
Performs a left outer join to a collection in the same database.
Atlas Data Lake provides syntax for joining collections from different
databases also. See $lookup for more information. | |||||||||
Filters the documents to pass only the documents that match the
specified condition(s) to the next pipeline stage. Atlas Data Lake
supports $match . Note that the partition
attributes for selecting
specific files on S3 are only optimized for the following
aggregation pipeline operators: $eq, $gt, $lt, $gte, $lte, $ne, $and, $or. | |||||||||
Writes the results of the aggregation pipeline to a specified collection.
Atlas Data Lake provides alternate syntax for the required into
field to allow writes to an Atlas cluster. To learn more,
see $merge . | |||||||||
Takes the documents returned by the aggregation pipeline and writes them to a specified collection. Atlas Data Lake provides alternate syntax for writing to S3 and Atlas cluster. To use $out to write to a collection in a different database on the same Atlas cluster, your Atlas cluster must be on MongoDB version 4.4 or later. See | |||||||||
Randomly selects the specified number of documents from its
input. Atlas Data Lake supports $sample , but does not provide a
truly random sample and returns the first set of documents that
it finds. | |||||||||
Skips over the specified number of documents that pass into the
stage and passes the remaining documents to the next stage in
the pipeline. Atlas Data Lake supports $skip , but this does not
reduce data scan because Data Lake accesses all partitions that
correspond to your query. |
Supported Aggregation Pipeline Operators
Atlas Data Lake supports all the aggregation pipeline operators. However, Atlas Data Lake supports all the geospatial query operators and the following evaluation query operators only in queries on collections that are mapped to an Atlas cluster data store.
Atlas Data Lake doesn't include a server-side JavaScript engine. So, Atlas Data Lake doesn't support operators such as $where, $function, and $accumulator that require server-side scripting enabled.