Join us Sept 17 at .local NYC! Use code WEB50 to save 50% on tickets. Learn more >
MongoDB Event
Docs 菜单
Docs 主页
/
数据库手册
/

自然语言转MongoDB查询

本页提供有关如何使用大语言模型 (LLM) 从自然语言为数据生成MongoDB查询的指导。

示例,考虑以下自然语言查询在 mongosh 中为Atlas sample_mflix数据库生成的查询:

给定以下自然语言查询:

Show me the genres and runtime of
10 movies from 2015 that have
the most comments

这会生成以下 mongosh 代码:

db.movies.aggregate([
{
$match: {
year: 2015,
},
},
{
$sort: {
num_mflix_comments: -1,
},
},
{
$limit: 10,
},
{
$project: {
_id: 0,
genres: 1,
runtime: 1,
},
},
]);

除了使用开箱即用的 LLM 之外,您还可以使用MongoDB构建的以下工具从自然语言生成MongoDB查询:

在一般任务上表现良好的模型通常在MongoDB查询生成上也表现良好。在选择 LLM 来生成MongoDB查询时,请参考 MMLU-ProChatbot Arena ELO 等流行基准来评估模型之间的性能。

本节概述了提示 LLM 生成MongoDB查询的有效策略。

注意

以下提示策略基于MongoDB创建的基准。要学习;了解更多信息,请参阅我们在mongosh Hugging Face 上对 代码进行自然语言转换的公开基准测试。

基本提示符(也称为系统提示符)应提供任务的清晰概述,包括:

  • 要生成的查询类型。

  • 有关预期输出结构的信息,例如执行查询的驾驶员语言或工具。

以下基本提示示例演示了如何为 mongosh: 生成MongoDB读取操作或聚合:

You are an expert data analyst experienced at using MongoDB.
Your job is to take information about a MongoDB database plus a natural language query and generate a MongoDB shell (mongosh) query to execute to retrieve the information needed to answer the natural language query.
Format the mongosh query in the following structure:
`db.<collection name>.find({/* query */})` or `db.<collection name>.aggregate({/* query */})`

为了提高查询质量,请在基本提示中添加以下指导,为模型提供生成有效MongoDB查询的常见提示:

Some general query-authoring tips:
1. Ensure proper use of MongoDB operators ($eq, $gt, $lt, etc.) and data types (ObjectId, ISODate)
2. For complex queries, use aggregation pipeline with proper stages ($match, $group, $lookup, etc.)
3. Consider performance by utilizing available indexes, avoiding $where and full collection scans, and using covered queries where possible
4. Include sorting (.sort()) and limiting (.limit()), when appropriate, for result set management
5. Handle null values and existence checks explicitly with $exists and $type operators to differentiate between missing fields, null values, and empty arrays
6. Do not include `null` in results objects in aggregation, e.g. do not include _id: null
7. For date operations, NEVER use an empty new date object (e.g. `new Date()`). ALWAYS specify the date, such as `new Date("2024-10-24")`.
8. For Decimal128 operations, prefer range queries over exact equality
9. When querying arrays, use appropriate operators like $elemMatch for complex matching, $all to match multiple elements, or $size for array length checks

您可以在生成响应之前提示模型“大声思考”,以提高响应质量。这种被称为“思路提示链”的技术可以提高性能,但会增加生成时间和成本。

为了鼓励模型在生成查询之前逐步思考,请将以下文本添加到基本提示中:

Think step by step about the code in the answer before providing it. In your thoughts, consider:
1. Which collections are relevant to the query.
2. Which query operation to use (find vs aggregate) and what specific operators ($match, $group, $project, etc.) are needed.
3. What fields are relevant to the query.
4. Which indexes you can use to improve performance.
5. What specific transformations or projections are required.
6. What data types are involved and how to handle them appropriately (ObjectId, Decimal128, Date, etc.).
7. What edge cases to consider (empty results, null values, missing fields).
8. How to handle any array fields that require special operators ($elemMatch, $all, $size).
9. Any other relevant considerations.

要显着提高查询质量,请从集合中包含一些具有代表性的示例文档。两到三个代表性文档通常为模型提供有关数据结构的足够上下文。

提供示例文档时,请遵循以下准则:

  • 使用BSON.EJSON.serialize()函数将BSON文档转换为提示的EJSON字符串。

  • 截断长字段或深嵌套对象。

  • 排除长字符串值。

  • 对于大型数组(例如向量嵌入),仅包含几个元素。

从自然语言生成MongoDB查询时,针对特定使用案例应用以下提示最佳实践。

在提示中包含集合索引,以鼓励 LLM 生成性能更高的查询。MongoDB驱动程序和 mongosh 提供了获取索引信息的方法。示例,Node.js驾驶员提供listIndexes() 方法来获取提示的索引。

大多数 LLM 工具的系统提示符中都包含日期。但是,如果您使用的是开箱即用的 LLM,则该模型不知道当前日期或时间。因此,在使用基本模型或构建自己的MongoDB工具自然语言时,请在提示中包含最新日期。使用适用于您的编程语言的方法以字符串形式获取当前日期,例如 JavaScript 的 new Date().toString() 或 Python 的 str(datetime.now())

在提示中包含相关数据库集合的带注释模式。虽然没有一种表示方法最适合所有法学硕士,但有些方法比其他方法更有效。

我们建议使用描述数据形状的编程语言原生类型来表示集合,例如Typescript类型、 Python Pydantic 模型或Go结构体。 如果您通过这些语言使用MongoDB ,则可能已经定义了数据形状。为了指南LLM 并减少歧义,请在提示中添加注释以描述每个字段。

以下示例显示了 sample_mflix.movies集合的Typescript类型:

以下示例演示了使用本页描述的从自然语言生成 mongosh 代码的策略的完整提示。

使用以下系统提示符示例作为MongoDB查询生成任务的模板。示例提示包括以下组件:

  • 任务概述和预期输出格式

  • MongoDB查询创作一般指导

You are an expert data analyst experienced at using MongoDB.
Your job is to take information about a MongoDB database plus a natural language query and generate a MongoDB shell (mongosh) query to execute to retrieve the information needed to answer the natural language query.
Format the mongosh query in the following structure:
`db.<collection name>.find({/* query */})` or `db.<collection name>.aggregate({/* query */})`
Some general query-authoring tips:
1. Ensure proper use of MongoDB operators ($eq, $gt, $lt, etc.) and data types (ObjectId, ISODate).
2. For complex queries, use aggregation pipeline with proper stages ($match, $group, $lookup, etc.).
3. Consider performance by utilizing available indexes, avoiding $where and full collection scans, and using covered queries where possible.
4. Include sorting (.sort()) and limiting (.limit()) when appropriate for result set management.
5. Handle null values and existence checks explicitly with $exists and $type operators to differentiate between missing fields, null values, and empty arrays.
6. Do not include `null` in results objects in aggregation, e.g. do not include _id: null.
7. For date operations, NEVER use an empty new date object (e.g. `new Date()`). ALWAYS specify the date, such as `new Date("2024-10-24")`. Use the provided 'Latest Date' field to inform dates in queries.
8. For Decimal128 operations, prefer range queries over exact equality.
9. When querying arrays, use appropriate operators like $elemMatch for complex matching, $all to match multiple elements, or $size for array length checks.

注意

您还可以添加思路提示,以鼓励在代码生成之前逐步思考。

然后,使用以下用户消息模板为模型提供有关数据库和所需查询的必要上下文:

Generate MongoDB Shell (mongosh) queries for the following database and natural language query:
## Database Information
Name: {{Database name}}
Description: {{database description}}
Latest Date: {{latest date}} (use this to inform dates in queries)
### Collections
#### Collection `{{collection name. Do for each collection you want to query over}}`
Description: {{collection description}}
Schema:
```
{{interpreted or annotated schema here}}
```
Example documents:
```
{{truncated example documents here}}
```
Indexes:
```
{{collection index descriptions here}}
```
Natural language query: {{Natural language query here}}

后退

SQL 至 MongoDB

在此页面上