开始使用 Spring AI 集成

您可以将MongoDB Vector Search 与 Spring AI 集成，以使用MongoDB Java Sync驱动程序构建生成式AI应用程序。本教程演示如何开始使用MongoDB Vector Search 作为 Spring AI的向量存储，然后演示如何对数据执行语义搜索。

具体来说，您需要执行以下操作：

设置环境。
创建MongoDB Vector Search索引。
将向量嵌入数据存储在MongoDB中。
对您的数据运行语义搜索查询。

提示

已完成的示例应用程序

要下载本教程演示如何构建的应用程序的完整版本，请参阅后续步骤部分。

背景

Spring AI 是 Spring 的应用程序框架，它允许您将各种 AI 服务和插件与您的应用程序相结合。您可以将 Spring AI 用于各种基于文本的 AI 应用场景。

您可以将MongoDB用作向量数据库，并使用MongoDB Vector Search 从数据中检索语义相似的文档来实现RAG。要学习；了解有关 RAG 的更多信息，请参阅使用MongoDB检索增强生成 (RAG)。

先决条件

如要完成本教程，您必须具备以下条件：

以下MongoDB 集群类型之一：
- 一个 Atlas 集群，运行 MongoDB 6.0.11、7.0.2 或更高版本。请确保您的 IP 地址包含在 Atlas 项目的访问列表中。
- 使用Atlas CLI创建的本地Atlas部署。要学习；了解更多信息，请参阅创建本地Atlas部署。
- 安装了Search 和 Vector Search的MongoDB Community或 Enterprise集群。
OpenAI API密钥。您必须拥有一个具有可用于API请求的积分的 OpenAI 帐户。要学习；了解有关注册 OpenAI 帐户的更多信息，请参阅 OpenAI API网站。

Java 开发工具包 (JDK) 版本 8 或更高版本。
设立和运行Java应用程序的环境。我们建议您使用IntelliJ IDEA或Eclipse IDE等集成开发环境来配置 Maven 或 Gradle，以构建和运行项目。

设置环境

您必须首先为本教程设置环境，包括添加必要的依赖项和设置配置属性。

创建 Spring Java 应用程序。

导航到 Spring Initializr 并使用以下设置来配置项目：
- 项目： Maven
- 语言： Java
- Spring Boot：您可以使用所选的默认版本。
- 项目元数据：
- Java: 21
- 您可以为所有其他字段使用默认值。
在 Spring Initializr 的右侧，单击 ADD DEPENDENCIES，然后搜索并添加以下依赖项：
- MongoDB Atlas Vector Database
- Spring Data MongoDB
单击 GENERATE（生成）以下载您的 Spring 项目的压缩版本。解压缩该文件并在 IDE 中将其打开。

添加依赖项。

Spring AI为MongoDB Vector Search 提供了 Spring Boot 自动配置。

将以下依赖项添加到项目的 pom.xml 文件中的 dependencies 数组中。这些依赖项将 Spring AI 和自动配置库添加到应用程序中：

pom.xml

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-spring-boot-autoconfigure</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>

接下来，确保 pom.xml 文件包含 Spring AI 物料清单 (BOM) 的dependencyManagement条目。
重要
将用于 Spring AI BOM 的 spring-ai.version 常量设置为 1.0.0-SNAPSHOT，以在您的应用程序中实现最新的 Spring AI 功能。
要了解有关 Spring AI BOM 的更多信息，请参阅 Spring AI 文档中的依赖项管理部分。
最后，将 Spring AI 快照存储库添加到 pom.xml 文件的 repositories 条目中：
pom.xml
```
<repository>
  <id>spring-snapshots</id>
  <name>Spring Snapshots</name>
  <url>https://repo.spring.io/snapshot</url>
  <releases>
    <enabled>false</enabled>
  </releases>
</repository>
```
要了解有关这些存储库的更多信息，请参阅 Spring AI 文档中的添加里程碑和快照存储库部分。
完成编辑 pom.xml 文件后，请重新加载项目以确保依赖项已安装。

定义应用程序属性。

找到 src/main/resources/application.properties 文件并将该文件的内容替换为以下属性。将占位符替换为您的 OpenAI API 密钥和 Atlas 连接字符串：

src/main/resources/application.properties

spring.application.name=springai-mongodb
spring.ai.openai.api-key=<OpenAI API Key>
spring.ai.openai.embedding.options.model=text-embedding-ada-002
spring.data.mongodb.uri=<connection string>
spring.data.mongodb.database=springai_test
spring.ai.vectorstore.mongodb.indexName=vector_index
spring.ai.vectorstore.mongodb.collection-name=vector_store

注意

连接字符串应使用以下格式：

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net/?<settings>

要学习；了解有关检索连接字符串的更多信息，请参阅连接字符串指南。

创建MongoDB Vector Search 索引

要在向量存储上启用向量搜索查询，必须在 springai_test.vector_store集合上创建MongoDB Vector Search索引。

注意

必需的访问权限

要创建MongoDB Vector Search索引，您必须对MongoDB项目具有Project Data Access Admin或更高访问权限。

启用模式初始化。

当您将Atlas配置为应用程序中的向量存储时，Spring AI可以自动初始化后端模式。此初始化包括在包含向量嵌入的集合上创建MongoDB Vector Search索引。

要启用模式初始化，请在 application.properties 文件中添加以下设置：

src/main/resources/application.properties

spring.ai.vectorstore.mongodb.initialize-schema=true

指定 initialize-schema=true 会导致 Spring AI以编程方式在集群上创建MongoDB Vector Search索引。要学习；了解更多信息，请参阅创建MongoDB Vector Search 索引。

注意

已知问题：现有索引

如果 springai_test.vector_store集合上已有名为 vector_index 的MongoDB Vector Search索引，则 Spring AI不会创建额外的索引。因此，如果现有索引配置了不兼容的设置（例如不同的维度数），您可能会在本教程的后面遇到错误。

确保索引具有以下配置：

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"
    }
  ]
}

使用MongoDB作为向量存储

本部分演示如何将MongoDB配置为向量数据库（也称为向量存储），以便您可以存储自定义数据的向量嵌入。

在项目中找到 src/main/java/com/example/demo/DemoApplication.java 文件。在与此文件相同的级别，创建一个名为 config的目录，然后在此目录中创建一个名为 Config.java 的文件，设置 Spring App 配置。

以下步骤演示了如何创建准备向量存储所需的 Bean 对象。

添加 import 语句。

将以下代码粘贴到 Config.java 文件中以导入所需的类：

/config/Config.java

import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.ai.openai.OpenAiEmbeddingModel;
import org.springframework.ai.openai.api.OpenAiApi;
import org.springframework.ai.vectorstore.MongoDBAtlasVectorStore;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.SpringBootConfiguration;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.mongodb.core.MongoTemplate;

参考应用程序属性。

将以下代码粘贴到 Config.java 文件中，以引用您在应用程序属性文件中设置的值：

/config/Config.java

@Configuration
@SpringBootConfiguration
@EnableAutoConfiguration
public class Config {
    @Value("${spring.ai.openai.api-key}")
    private String openAiKey;
    @Value("${spring.data.mongodb.database}")
    private String databaseName;
    @Value("${spring.ai.vectorstore.mongodb.collection-name:vector_store}")
    private String collectionName;
    @Value("${spring.ai.vectorstore.mongodb.indexName:vector_index}")
    private String indexName;
    @Value("${spring.data.mongodb.uri}")
    private String mongoUri;
    @Value("${spring.ai.vectorstore.mongodb.initialize-schema}")
    private Boolean initSchema;
    // Add beans here...
}

创建 `EmbeddingModel` Spring bean。

接下来，粘贴以下代码以生成使用 OpenAI API 创建向量嵌入的 OpenAiEmbeddingModel 实例：

/config/Config.java

@Bean
public EmbeddingModel embeddingModel() {
    return new OpenAiEmbeddingModel(new OpenAiApi(openAiKey));
}

创建 `VectorStore` Spring bean。

最后，粘贴以下代码，创建返回 VectorStore 实例的 bean。VectorStore 实例使用与您的部署对应的 MongoTemplate 以及在上一步中创建的 OpenAiEmbeddingModel。

/config/Config.java

@Bean
public VectorStore mongodbVectorStore(MongoTemplate mongoTemplate, EmbeddingModel embeddingModel) {
    return new MongoDBAtlasVectorStore(mongoTemplate, embeddingModel,
            MongoDBAtlasVectorStore.MongoDBVectorStoreConfig.builder().build(), initSchema);
}

存储自定义数据并运行语义搜索查询

在本节中，您可以学习；了解如何在Java应用程序中创建端点，以将自定义数据的向量嵌入存储在MongoDB中，然后对该数据运行语义搜索查询。

创建端点

在与 config 文件夹相同的级别，创建一个 controller 文件夹，然后创建一个 Controller.java 文件来设置您的 API 终端节点。以下步骤演示如何创建 GET 端点以将数据添加到向量存储中，并使用 similaritySearch() 方法运行语义搜索查询。

添加 import 语句。

将以下代码粘贴到 Controller.java 文件中以导入所需的类：

/controller/Controller.java

import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.*;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

创建控制器。

粘贴以下代码，以执行以下任务：

为 Controller 类添加注释，将其标记为应用程序控制器。
创建一个映射，将请求映射到 /tutorial 路径。
自动装配 VectorStore bean。

/controller/Controller.java

@RestController
@RequestMapping("/tutorial")
public class Controller {
    @Autowired
    private VectorStore vectorStore;
    // Add endpoints here...
}

创建端点以将文档添加到矢量存储中。

将以下代码粘贴到控制器中，以创建一个 GET 端点，用于创建示例文档并将其作为向量嵌入保存到向量存储中：

/controller/Controller.java

@GetMapping("/add")
public String addDocuments() {
    List<Document> docs = List.of(
            new Document("Proper tuber planting involves site selection, proper timing, and exceptional care. Choose spots with well-drained soil and adequate sun exposure. Tubers are generally planted in spring, but depending on the plant, timing varies. Always plant with the eyes facing upward at a depth two to three times the tuber's height. Ensure 4 inch spacing between small tubers, expand to 12 inches for large ones. Adequate moisture is needed, yet do not overwater. Mulching can help preserve moisture and prevent weed growth.", Map.of("author", "A", "type","post")),
            new Document("Successful oil painting necessitates patience, proper equipment, and technique. Begin with a carefully prepared, primed canvas. Sketch your composition lightly before applying paint. Use high-quality brushes and oils to create vibrant, long-lasting artworks. Remember to paint 'fat over lean,' meaning each subsequent layer should contain more oil to prevent cracking. Allow each layer to dry before applying another. Clean your brushes often and avoid solvents that might damage them. Finally, always work in a well-ventilated space.", Map.of("author", "A")),
            new Document("For a natural lawn, selection of the right grass type suitable for your climate is crucial. Balanced watering, generally 1 to 1.5 inches per week, is important; overwatering invites disease. Opt for organic fertilizers over synthetic versions to provide necessary nutrients and improve soil structure. Regular lawn aeration helps root growth and prevents soil compaction. Practice natural pest control and consider overseeding to maintain a dense sward, which naturally combats weeds and pest.", Map.of("author", "B", "type","post"))
    );
    vectorStore.add(docs);
    return "Documents added successfully!\n";
}

创建端点，以执行语义搜索。

将以下代码粘贴到控制器中，以创建 GET 端点，该端点对短语 "learn how to grow things" 执行语义搜索查询并返回两个最相关的结果：

/controller/Controller.java

1 @GetMapping("/search")
2 public List<Map<String, Object>> searchDocuments() {
3 
4     List<Document> results = vectorStore.similaritySearch(
5             SearchRequest
6                     .query("learn how to grow things")
7                     .withTopK(2)
8     );
9 
10     return results.stream().map(doc -> Map.of(
11             "content", doc.getContent(),
12             "metadata", doc.getMetadata()
13     )).collect(Collectors.toList());
14 }

（可选）使用元数据过滤执行语义搜索。

要使用元数据过滤执行搜索，可以使用 Java Sync 驱动程序中的 Filter.Expression 构建器类。

您可以使用 MQL 匹配表达式对文档进行预过滤。此示例过滤 author 字段值为 "A" 的文档。然后，它对短语 "learn how to grow things" 执行语义搜索查询。

在上一步定义的 searchDocuments() 方法的正文中，将调用 similaritySearch() 方法的代码（前一区块中的 4-8 行）替换为以下代码：

/controller/Controller.java

FilterExpressionBuilder b = new FilterExpressionBuilder();
List<Document> results = vectorStore.similaritySearch(
        SearchRequest.defaults()
                .withQuery("learn how to grow things")
                .withTopK(2)
                .withSimilarityThreshold(0.5)
                .withFilterExpression(b.eq("author", "A").build())
);

注意

您必须将元数据字段的路径添加到MongoDB Vector Search索引中。要学习；了解更多信息，请参阅“如何为向量搜索的字段进行索引”教程的关于 filter 类型部分。

要学习了解有关元数据预过滤的更多信息，请参阅 MongoDB 向量搜索预过滤。

访问端点

运行应用程序后，您可以访问端点，首先将文档添加到向量存储中，然后执行语义搜索查询。

运行应用程序。

使用 IDE 工具构建并运行应用程序。如果您使用默认设置，则应用程序将在本地通过端口 8080 运行。

从终端访问端点。

确认应用程序正在运行后，在终端中运行以下命令以访问 add 端点，该端点会将示例数据转换为向量嵌入并将嵌入插入 Atlas 中：

curl -X GET http://localhost:8080/tutorial/add

Documents added successfully!

提示

访问端点后，如果您使用的是Atlas ，则可以导航到Atlas用户用户界面中的 springai_test.vector_store 命名空间以验证向量嵌入。

然后，在终端中运行以下命令以访问 search 终端以执行语义搜索：

curl -X GET http://localhost:8080/tutorial/search

[{"content":"For a natural lawn, selection of the right grass type
suitable for your climate is crucial. Balanced watering, generally 1 to
1.5 inches per week, is important; overwatering invites disease. Opt for
organic fertilizers over synthetic versions to provide necessary
nutrients and improve soil structure. Regular lawn aeration helps root
growth and prevents soil compaction. Practice natural pest control and
consider overseeding to maintain a dense sward, which naturally combats
weeds and
pest.","metadata":{"type":"post","author":"B"}},{"content":"Proper tuber
planting involves site selection, proper timing, and exceptional care.
Choose spots with well-drained soil and adequate sun exposure. Tubers
are generally planted in spring, but depending on the plant, timing
varies. Always plant with the eyes facing upward at a depth two to three
times the tuber's height. Ensure 4 inch spacing between small tubers,
expand to 12 inches for large ones. Adequate moisture is needed, yet do
not overwater. Mulching can help preserve moisture and prevent weed
growth.","metadata":{"type":"post","author":"A"}}]

后续步骤

您可以从 GitHub 查看并下载此应用程序的完整版本。您可以使用完整的应用程序来排除自己的应用程序进行故障排除或快速测试功能。

MongoDB 还提供以下开发者资源：

提示

Spring AI 文档

Spring AI MongoDB Atlas 参考文档

后退

Haystack

来年

LangGraph

1	@GetMapping("/search")
2	public List<Map<String, Object>> searchDocuments() {
3
4	List<Document> results = vectorStore.similaritySearch(
5	SearchRequest
6	.query("learn how to grow things")
7	.withTopK(2)
8	);
9
10	return results.stream().map(doc -> Map.of(
11	"content", doc.getContent(),
12	"metadata", doc.getMetadata()
13	)).collect(Collectors.toList());
14	}

提示

已完成的示例应用程序

背景

先决条件

设置环境

创建 Spring Java 应用程序。

添加依赖项。

重要

定义应用程序属性。

注意

创建MongoDB Vector Search 索引

注意

必需的访问权限

启用模式初始化。

注意

已知问题：现有索引

使用MongoDB作为向量存储

添加 import 语句。

参考应用程序属性。

创建 EmbeddingModel Spring bean。

创建 VectorStore Spring bean。

存储自定义数据并运行语义搜索查询

创建端点

添加 import 语句。

创建控制器。

创建端点以将文档添加到矢量存储中。

创建端点，以执行语义搜索。

（可选）使用元数据过滤执行语义搜索。

注意

访问端点

运行应用程序。

从终端访问端点。

提示

后续步骤

提示

创建 `EmbeddingModel` Spring bean。

创建 `VectorStore` Spring bean。