/ /

通过聚合转换数据

Overview

在本指南中，您可以了解如何使用 PyMongo 执行聚合操作。

聚合操作处理 MongoDB 集合中的数据并返回计算结果。 MongoDB 聚合框架是 Query API 的一部分，以数据处理管道的概念为模型。文档进入包含一个或多个阶段的管道，该管道将文档转换为聚合结果。

聚合操作类似于汽车工厂。汽车工厂有一条装配线，其中包含配备专用工具的装配站，用于完成特定的工作，例如钻机和焊机。毛坯零件会进入工厂，然后装配线将其转换并组装为成品。

聚合管道是装配线，聚合阶段是装配站，操作符表达式则是专用工具。

提示

完成聚合教程

您可以在服务器手册的完整聚合管道教程部分找到详细解释常见聚合任务的教程。选择一个教程，然后从页面右上角的 Select your language 下拉菜单中选择 Python。

聚合与查找操作

您可以使用查找操作执行以下动作：

选择要返回的文档
选择要返回的字段
对结果进行排序

您可以使用聚合操作执行以下动作：

执行查找操作
重命名字段
计算字段
汇总数据
对值进行分组

限制

使用聚合操作时，请记住以下限制：

返回的文档不得违反 BSON 文档大小限制（16 兆字节）。
默认，管道阶段的内存限制为100 MB。您可以使用aggregate()方法的allowDiskUse关键字参数来超出此限制。

重要

$graphLookup 异常

$graphLookup阶段有100 MB 的严格内存限制，并忽略allowDiskUse参数。

聚合示例

注意

此示例使用Atlas示例数据集中的sample_restaurants.restaurants集合。要学习；了解如何创建免费的MongoDB Atlas 群集并加载示例数据集，请参阅PyMongo入门。

如需执行聚合，请向 collection.aggregate() 方法传递聚合阶段列表。

以下代码示例计算纽约每个区的面包店数量。为此，它使用具有以下阶段的聚合管道：

$match阶段，用于筛选cuisine字段包含值"Bakery"的文档。
$group阶段，用于按borough字段对匹配文档进行分组，并累积每个不同值的文档计数。

选择 Synchronous 或 Asynchronous标签页，查看相应的代码：

# Define an aggregation pipeline with a match stage and a group stage
pipeline = [
   { "$match": { "cuisine": "Bakery" } },
   { "$group": { "_id": "$borough", "count": { "$sum": 1 } } }
]
# Execute the aggregation
aggCursor = collection.aggregate(pipeline)
# Print the aggregated results
for document in aggCursor:
   print(document)

# Define an aggregation pipeline with a match stage and a group stage
pipeline = [
   { "$match": { "cuisine": "Bakery" } },
   { "$group": { "_id": "$borough", "count": { "$sum": 1 } } }
]
# Execute the aggregation
aggCursor = await collection.aggregate(pipeline)
# Print the aggregated results
async for document in aggCursor:
   print(document)

前面的代码示例生成类似于以下内容的输出：

{'_id': 'Bronx', 'count': 71}
{'_id': 'Brooklyn', 'count': 173}
{'_id': 'Missing', 'count': 2}
{'_id': 'Manhattan', 'count': 221}
{'_id': 'Queens', 'count': 204}
{'_id': 'Staten Island', 'count': 20}

解释聚合

要查看有关 MongoDB 如何执行您的操作的信息，您可以指示 MongoDB 进行解释。 MongoDB 解释操作时，会返回执行计划和性能统计信息。执行计划是 MongoDB 完成操作的一种潜在方式。当您指示 MongoDB 解释一个操作时，它会返回 MongoDB 执行的计划和任何被拒绝的执行计划。

要解释聚合操作，您可以使用PyMongoExplain库或数据库命令。选择下面相应的标签页，查看每种方法的示例。

使用 pip 安装pymongoexplain库，如以下示例所示：

python3 -m pip install pymongoexplain

以下代码示例运行前面的聚合示例并打印MongoDB返回的解释：

# Define an aggregation pipeline with a match stage and a group stage
pipeline = [
   { "$match": { "cuisine": "Bakery" } },
   { "$group": { "_id": "$borough", "count": { "$sum": 1 } } }
]
# Execute the operation and print the explanation
result = ExplainableCollection(collection).aggregate(pipeline)
print(result)

...
'winningPlan': {'queryPlan': {'stage': 'GROUP',
                                      'planNodeId': 3,
                                      'inputStage': {'stage': 'COLLSCAN',
                                                     'planNodeId': 1,
                                                     'filter': {'cuisine': {'$eq': 'Bakery'}},
                                                     'direction': 'forward'}},
                                                    ...

以下代码示例运行前面的聚合示例并打印MongoDB返回的解释：

# Define an aggregation pipeline with a match stage and a group stage
pipeline = [
   { $match: { cuisine: "Bakery" } },
   { $group: { _id: "$borough", count: { $sum: 1 } } }
]
# Execute the operation and print the explanation
result = database.command("aggregate", "collection", pipeline=pipeline, explain=True)
print(result)

...
'command': {'aggregate': 'collection',
  'pipeline': [{'$match': {'cuisine': 'Bakery'}},
               {'$group': {'_id': '$borough',
                           'count': {'$sum': 1}}}],
  'explain': True,
...

提示

您可以使用 Python 的pprint模块使解释结果更易于阅读：

import pprint
...
pprint.pp(result)