Docs Menu
Docs Home
/
Atlas
/ /

Get Started with the LangChain JS/TS Integration

Note

This tutorial uses LangChain's JavaScript library. For a tutorial that uses the Python library, see LangChain Python.

You can integrate Atlas Vector Search with LangChain to build LLM applications and implement retrieval-augmented generation (RAG). This tutorial demonstrates how to start using Atlas Vector Search with LangChain to perform semantic search on your data and build a RAG implementation. Specifically, you perform the following actions:

  1. Set up the environment.

  2. Store custom data on Atlas.

  3. Create an Atlas Vector Search index on your data.

  4. Run the following vector search queries:

    • Semantic search.

    • Semantic search with metadata pre-filtering.

    • Maximal Marginal Relevance (MMR) search.

  5. Implement RAG by using Atlas Vector Search to answer questions on your data.

LangChain is an open-source framework that simplifies the creation of LLM applications through the use of "chains." Chains are LangChain-specific components that can be combined for a variety of AI use cases, including RAG.

By integrating Atlas Vector Search with LangChain, you can use Atlas as a vector database and use Atlas Vector Search to implement RAG by retrieving semantically similar documents from your data. To learn more about RAG, see Retrieval-Augmented Generation (RAG) with Atlas Vector Search.

To complete this tutorial, you must have the following:

  • An Atlas account with a cluster running MongoDB version 6.0.11, 7.0.2, or later (including RCs). Ensure that your IP address is included in your Atlas project's access list. To learn more, see Create a Cluster.

  • A Voyage AI API Key. To create an account and API Key, see the Voyage AI website.

  • An OpenAI API Key. You must have an OpenAI account with credits available for API requests. To learn more about registering an OpenAI account, see the OpenAI API website.

  • A terminal and code editor to run your Node.js project.

  • npm and Node.js installed.

Set up the environment for this tutorial. To set up your environment, complete the following steps.

1

Run the following commands in your terminal to create a new directory named langchain-mongodb and initialize your project:

mkdir langchain-mongodb
cd langchain-mongodb
npm init -y
2

Run the following command:

npm install langchain @langchain/community @langchain/mongodb @langchain/openai pdf-parse fs
3

Configure your project to use ES modules by adding "type": "module" to your package.json file and then saving it.

{
"type": "module",
// other fields...
}
4

In your project, create a file named get-started.js, and then copy and paste the following code into the file. You will add code to this file throughout the tutorial.

This initial code snippet imports required packages for this tutorial, defines environment variables, and establishes a connection to your Atlas cluster.

import { formatDocumentsAsString } from "langchain/util/document";
import { MongoClient } from "mongodb";
import { MongoDBAtlasVectorSearch } from "@langchain/mongodb";
import { ChatOpenAI } from "@langchain/openai";
import { VoyageEmbeddings } from "@langchain/community/embeddings/voyage";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
import { PromptTemplate } from "@langchain/core/prompts";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { RunnableSequence, RunnablePassthrough } from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";
import * as fs from 'fs';
process.env.VOYAGEAI_API_KEY = "<api-key>"
process.env.OPENAI_API_KEY = "<api-key>";
process.env.ATLAS_CONNECTION_STRING = "<connection-string>";
const client = new MongoClient(process.env.ATLAS_CONNECTION_STRING);
5

To finish setting up the environment, replace the <api-key> and <connection-string> placeholder values in get-started.js with your Voyage AI API Key, your OpenAI API Key and the SRV connection string for your Atlas cluster respectively. Your connection string should use the following format:

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

In this section, you define an asynchronous function to load custom data into Atlas and instantiate Atlas as a vector database, also called a vector store. Add the following code into your get-started.js file.

Note

For this tutorial, you use a publicly accessible PDF document titled MongoDB Atlas Best Practices as the data source for your vector store. This document describes various recommendations and core concepts for managing your Atlas deployments.

This code performs the following actions:

  • Configures your Atlas collection by specifying the following parameters:

    • langchain_db.test as the Atlas collection to store the documents.

    • vector_index as the index to use for querying the vector store.

    • text as the name of the field containing the raw text content.

    • embedding as the name of the field containing the vector embeddings.

  • Prepares your custom data by doing the following:

    • Retrieves raw data from the specified URL and saves it as PDF.

    • Uses a text splitter to split the data into smaller documents.

    • Specifies chunk parameters, which determines the number of characters in each document and the number of characters that should overlap between two consecutive documents.

  • Creates a vector store from the sample documents by calling the MongoDBAtlasVectorSearch.fromDocuments method. This method specifies the following parameters:

    • The sample documents to store in the vector database.

    • Voyage AI's embedding model as the model used to convert text into vector embeddings for the embedding field.

    • Your Atlas configuration.

async function run() {
try {
// Configure your Atlas collection
const database = client.db("langchain_db");
const collection = database.collection("test");
const dbConfig = {
collection: collection,
indexName: "vector_index", // The name of the Atlas search index to use.
textKey: "text", // Field name for the raw text content. Defaults to "text".
embeddingKey: "embedding", // Field name for the vector embeddings. Defaults to "embedding".
};
// Ensure that the collection is empty
const count = await collection.countDocuments();
if (count > 0) {
await collection.deleteMany({});
}
// Save online PDF as a file
const rawData = await fetch("https://webassets.mongodb.com/MongoDB_Best_Practices_Guide.pdf");
const pdfBuffer = await rawData.arrayBuffer();
const pdfData = Buffer.from(pdfBuffer);
fs.writeFileSync("atlas_best_practices.pdf", pdfData);
// Load and split the sample data
const loader = new PDFLoader(`atlas_best_practices.pdf`);
const data = await loader.load();
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 200,
chunkOverlap: 20,
});
const docs = await textSplitter.splitDocuments(data);
// Instantiate Atlas as a vector store
const embeddingModel = new VoyageEmbeddings({ model: "voyage-3-large" });
const vectorStore = await MongoDBAtlasVectorSearch.fromDocuments(docs, embeddingModel, dbConfig);
} finally {
// Ensure that the client will close when you finish/error
await client.close();
}
}
run().catch(console.dir);

Save the file, then run the following command to load your data into Atlas.

node get-started.js

Tip

After running get-started.js, you can view your vector embeddings in the Atlas UI by navigating to the langchain_db.test collection in your cluster.

Note

To create an Atlas Vector Search index, you must have Project Data Access Admin or higher access to the Atlas project.

To enable vector search queries on your vector store, create an Atlas Vector Search index on the langchain_db.test collection.

Add the following code to the asynchronous function that you defined in your get-started.js file. This code creates an index of the vectorSearch type that indexes the following fields:

  • embedding field as the vector type. The embedding field contains the embeddings created using Voyage AI's voyage-3-large embedding model. The index definition specifies 1024 vector dimensions and measures similarity using cosine.

  • loc.pageNumber field as the filter type for pre-filtering data by the page number in the PDF.

This code also uses an await function to ensure that your search index has synced to your data before it's used.

1// Ensure index does not already exist, then create your Atlas Vector Search index
2const indexes = await collection.listSearchIndexes("vector_index").toArray();
3if(indexes.length === 0){
4
5 // Define your Atlas Vector Search Index
6 const index = {
7 name: "vector_index",
8 type: "vectorSearch",
9 definition: {
10 "fields": [
11 {
12 "type": "vector",
13 "numDimensions": 1024,
14 "path": "embedding",
15 "similarity": "cosine"
16 },
17 {
18 "type": "filter",
19 "path": "loc.pageNumber"
20 }
21 ]
22 }
23 }
24
25 // Run the helper method
26 const result = await collection.createSearchIndex(index);
27 console.log(result);
28
29 // Wait for Atlas to sync index
30 console.log("Waiting for initial sync...");
31 await new Promise(resolve => setTimeout(() => {
32 resolve();
33 }, 10000));
34}

Save the file, then run the following command to create you Atlas Vector Search index.

node get-started.js

This section demonstrates various queries that you can run on your vectorized data. Now that you've created the index, add the following code to your asynchronous function to run vector search queries against your data.

Note

If you experience inaccurate results when querying your data, your index might be taking longer than expected to sync. Increase the number in the setTimeout function to allow more time for the initial sync.

1

The following code uses the similaritySearch method to perform a basic semantic search for the string MongoDB Atlas security. It returns a list of documents ranked by relevance with only the pageContent and pageNumber fields.

// Basic semantic search
const basicOutput = await vectorStore.similaritySearch("MongoDB Atlas security");
const basicResults = basicOutput.map((results => ({
pageContent: results.pageContent,
pageNumber: results.metadata.loc.pageNumber,
})))
console.log("Semantic Search Results:")
console.log(basicResults)
2
node get-started.js
Semantic Search Results:
[
{
pageContent: 'Atlas free tier, or download MongoDB for local \n' +
'development.\n' +
'Review the MongoDB manuals and tutorials in our \n' +
'documentation. \n' +
'More Resources\n' +
'For more on getting started in MongoDB:',
pageNumber: 30
},
{
pageContent: 'read isolation. \n' +
'With MongoDB Atlas, you can achieve workload isolation with dedicated analytics nodes. Visualization \n' +
'tools like Atlas Charts can be configured to read from analytics nodes only.',
pageNumber: 21
},
{
pageContent: '• Zoned Sharding — You can define specific rules governing data placement in a sharded cluster.\n' +
'Global Clusters in MongoDB Atlas allows you to quickly implement zoned sharding using a visual UI or',
pageNumber: 27
},
{
pageContent: 'are updated, associated indexes must be maintained, incurring additional CPU and disk I/O overhead. \n' +
'If you\'re running fully managed databases on MongoDB Atlas, the built-in Performance Advisor',
pageNumber: 20
}
]

You can pre-filter your data by using an MQL match expression that compares the indexed field with another value in your collection. You must index any metadata fields that you want to filter by as the filter type. To learn more, see How to Index Fields for Vector Search.

Note

You specified the loc.pageNumber field as a filter when you created the index for this tutorial.

1

The following code uses the similaritySearch method to perform a semantic search for the string MongoDB Atlas security. It specifies the following parameters:

  • The number of documents to return as 3.

  • A pre-filter on the loc.pageNumber field that uses the $eq operator to match documents appearing on page 17 only.

It returns a list of documents ranked by relevance with only the pageContent and pageNumber fields.

// Semantic search with metadata filter
const filteredOutput = await vectorStore.similaritySearch("MongoDB Atlas Search", 3, {
preFilter: {
"loc.pageNumber": {"$eq": 22 },
}
});
const filteredResults = filteredOutput.map((results => ({
pageContent: results.pageContent,
pageNumber: results.metadata.loc.pageNumber,
})))
console.log("Semantic Search with Filtering Results:")
console.log(filteredResults)
2
node get-started.js
Semantic Search with Filtering Results:
[
{
pageContent: 'Atlas Search is built for the MongoDB document data model and provides higher performance and',
pageNumber: 22
},
{
pageContent: 'Figure 9: Atlas Search queries are expressed through the MongoDB Query API and backed by the leading search engine library, \n' +
'Apache Lucene.',
pageNumber: 22
},
{
pageContent: 'consider using Atlas Search. The service is built on fully managed Apache Lucene but exposed to users \n' +
'through the MongoDB Aggregation Framework.',
pageNumber: 22
}
]

You can also perform semantic search based on Max Marginal Relevance (MMR), a measure of semantic relevance optimized for diversity.

1

The following code uses the maxMarginalRelevanceSearch method to search for the string MongoDB Atlas security. It also specifies an object that defines the following optional parameters:

  • k to limit the number of returned documents to 3.

  • fetchK to fetch only 10 documents before passing the documents to the MMR algorithm.

It returns a list of documents ranked by relevance with only the pageContent and pageNumber fields.

// Max Marginal Relevance search
const mmrOutput = await vectorStore.maxMarginalRelevanceSearch("MongoDB Atlas security", {
k: 3,
fetchK: 10,
});
const mmrResults = mmrOutput.map((results => ({
pageContent: results.pageContent,
pageNumber: results.metadata.loc.pageNumber,
})))
console.log("Max Marginal Relevance Search Results:")
console.log(mmrResults)
2
node get-started.js
Max Marginal Relevance Search Results:
[
{
pageContent: 'Atlas Search is built for the MongoDB document data model and provides higher performance and',
pageNumber: 22
},
{
pageContent: '• Zoned Sharding — You can define specific rules governing data placement in a sharded cluster.\n' +
'Global Clusters in MongoDB Atlas allows you to quickly implement zoned sharding using a visual UI or',
pageNumber: 27
},
{
pageContent: 'read isolation. \n' +
'With MongoDB Atlas, you can achieve workload isolation with dedicated analytics nodes. Visualization \n' +
'tools like Atlas Charts can be configured to read from analytics nodes only.',
pageNumber: 21
}
]

Tip

For more information, refer to the API reference.

This section demonstrates two different RAG implementations using Atlas Vector Search and LangChain. Now that you've used Atlas Vector Search to retrieve semantically similar documents, use the following code examples to prompt the LLM to answer questions against the documents returned by Atlas Vector Search.

1

This code does the following:

  • Instantiates Atlas Vector Search as a retriever to query for semantically similar documents.

  • Defines a LangChain prompt template to instruct the LLM to use these documents as context for your query. LangChain passes these documents to the {context} input variable and your query to the {question} variable.

  • Constructs a chain that uses OpenAI's chat model to generate context-aware responses based on your prompt.

  • Prompts the chain with a sample query about Atlas security recommendations.

  • Returns the LLM's response and the documents used as context.

// Implement RAG to answer questions on your data
const retriever = vectorStore.asRetriever();
const prompt =
PromptTemplate.fromTemplate(`Answer the question based on the following context:
{context}
Question: {question}`);
const model = new ChatOpenAI({});
const chain = RunnableSequence.from([
{
context: retriever.pipe(formatDocumentsAsString),
question: new RunnablePassthrough(),
},
prompt,
model,
new StringOutputParser(),
]);
// Prompt the LLM
const question = "How can I secure my MongoDB Atlas cluster?";
const answer = await chain.invoke(question);
console.log("Question: " + question);
console.log("Answer: " + answer);
// Return source documents
const retrievedResults = await retriever.getRelevantDocuments(question)
const documents = retrievedResults.map((documents => ({
pageContent: documents.pageContent,
pageNumber: documents.metadata.loc.pageNumber,
})))
console.log("\nSource documents:\n" + JSON.stringify(documents, 1, 2))
2

After you save the file, run the following command. The generated response might vary.

node get-started.js
Question: How can I secure my MongoDB Atlas cluster?
Answer: The given context does not explicitly provide detailed steps to secure a MongoDB Atlas cluster. However, based on general best practices, here are some common steps to secure your MongoDB Atlas cluster:
1. **Enable Network Access Controls**: Configure IP whitelists to only allow connections from trusted IP addresses.
2. **Use Strong Authentication and Authorization**: Enable SCRAM (Salted Challenge Response Authentication Mechanism) for authenticating users and define roles with specific permissions.
3. **Encrypt Data**: Ensure data is encrypted both at rest and in transit by default in MongoDB Atlas.
4. **Enable VPC Peering (if applicable)**: Use Virtual Private Cloud (VPC) peering for secure and private connections.
5. **Monitor Activity**: Use MongoDB Atlas's built-in monitoring to track cluster activity and detect unauthorized attempts or anomalies.
6. **Implement Automated Backups**: Secure backups and ensure they are protected from unauthorized access.
7. **Educate Yourself**: Continuously refer to the MongoDB documentation and follow security best practices.
It is recommended to visit the MongoDB documentation and security guides for the most accurate and detailed steps tailored to your specific use case.
Source documents:
[
{
"pageContent": "Atlas free tier, or download MongoDB for local \ndevelopment.\nReview the MongoDB manuals and tutorials in our \ndocumentation. \nMore Resources\nFor more on getting started in MongoDB:",
"pageNumber": 30
},
{
"pageContent": "read isolation. \nWith MongoDB Atlas, you can achieve workload isolation with dedicated analytics nodes. Visualization \ntools like Atlas Charts can be configured to read from analytics nodes only.",
"pageNumber": 21
},
{
"pageContent": "• Zoned Sharding — You can define specific rules governing data placement in a sharded cluster.\nGlobal Clusters in MongoDB Atlas allows you to quickly implement zoned sharding using a visual UI or",
"pageNumber": 27
},
{
"pageContent": "22\nWorkload Type: Search\nIf your application requires rich full-text search functionality and you are running MongoDB on Atlas,",
"pageNumber": 22
}
]
1

This code does the following:

  • Instantiates Atlas Vector Search as a retriever to query for semantically similar documents. It also specifies the following optional parameters:

    • searchType as mmr, which specifies that Atlas Vector Search retrieves documents based on Max Marginal Relevance (MMR).

    • filter to add a pre-filter on the log.pageNumbers field to include documents that appear on page 17 only.

    • The following MMR-specific parameters:

      • fetchK to fetch only 20 documents before passing the documents to the MMR algorithm.

      • lambda, a value between 0 and 1 to determine the degree of diversity among the results, with 0 representing maximum diversity and 1 representing minimum diversity.

  • Defines a LangChain prompt template to instruct the LLM to use these documents as context for your query. LangChain passes these documents to the {context} input variable and your query to the {question} variable.

  • Constructs a chain that uses OpenAI's chat model to generate context-aware responses based on your prompt.

  • Prompts the chain with a sample query about Atlas security recommendations.

  • Returns the LLM's response and the documents used as context.

// Implement RAG to answer questions on your data
const retriever = await vectorStore.asRetriever({
searchType: "mmr", // Defaults to "similarity
filter: { preFilter: { "loc.pageNumber": { "$eq": 17 } } },
searchKwargs: {
fetchK: 20,
lambda: 0.1,
},
});
const prompt =
PromptTemplate.fromTemplate(`Answer the question based on the following context:
{context}
Question: {question}`);
const model = new ChatOpenAI({});
const chain = RunnableSequence.from([
{
context: retriever.pipe(formatDocumentsAsString),
question: new RunnablePassthrough(),
},
prompt,
model,
new StringOutputParser(),
]);
// Prompt the LLM
const question = "How can I secure my MongoDB Atlas cluster?";
const answer = await chain.invoke(question);
console.log("Question: " + question);
console.log("Answer: " + answer);
// Return source documents
const retrievedResults = await retriever.getRelevantDocuments(question)
const documents = retrievedResults.map((documents => ({
pageContent: documents.pageContent,
pageNumber: documents.metadata.loc.pageNumber,
})))
console.log("\nSource documents:\n" + JSON.stringify(documents, 1, 2))
2

After you save the file, run the following command. The generated response might vary.

node get-started.js
Question: How can I secure my MongoDB Atlas cluster?
Answer: To secure your MongoDB Atlas cluster, you can implement the following best practices:
1. **Enable Authentication and Authorization**
Ensure that authentication is enabled, which is the default for MongoDB Atlas. Use role-based access control (RBAC) to grant users only the permissions they need.
2. **Use Strong Passwords or Authentication Mechanisms**
Avoid simple passwords. Use strong, complex passwords for all database users. Alternatively, use certificate-based authentication or federated authentication with your identity provider.
3. **Whitelist IP Addresses**
Configure your Access List (IP Whitelist) to restrict access to trusted IP addresses. This ensures that only specified IP addresses can connect to your cluster.
4. **Enable Network Encryption (TLS/SSL)**
MongoDB Atlas supports TLS/SSL by default for securing data in transit. Ensure applications are configured to connect with SSL/TLS-enabled settings.
5. **Use End-to-End Encryption (Client-Side Field-Level Encryption)**
Implement client-side field-level encryption to ensure sensitive fields are encrypted end-to-end.
6. **Regularly Rotate Authentication Credentials**
Periodically rotate users' passwords or access keys to mitigate the risks of credential exposure.
7. **Use Private Networking**
If supported, use Virtual Private Cloud (VPC) peering or private endpoints, such as AWS PrivateLink, to connect securely to your MongoDB Atlas cluster without using the public internet.
8. **Enable Database Auditing**
Enable auditing to track database activity and detect potential anomalies or unauthorized access.
9. **Enable Backup and Data Recovery**
Regularly back up your data using MongoDB Atlas' automated backup systems to ensure business continuity in case of accidental deletions or data loss.
10. **Keep the MongoDB Drivers Updated**
Use the latest version of MongoDB drivers in your application to benefit from security updates and enhancements.
11. **Monitor and Set Alerts**
Use MongoDB Atlas' monitoring tools to track metrics and set up alerts for suspicious activities or unusual resource consumption.
12. **Implement Application-Level Security**
Ensure your application properly handles user authentication, session management, and input sanitization to prevent unauthorized access or injection attacks.
13. **Watch for Security Best Practices Updates**
Regularly review MongoDB Atlas documentation and security advisories to stay aware of new features and recommendations.
By following these practices, you can greatly enhance the security posture of your MongoDB Atlas cluster.
Source documents:
[
{
"pageContent": "Optimizing Data \nAccess Patterns\nNative tools in MongoDB for improving query \nperformance and reducing overhead.",
"pageNumber": 17
}
]

To learn how to integrate Atlas Vector Search with LangGraph, see Integrate MongoDB with LangGraph.js.

MongoDB also provides the following developer resources:

Back

Natural Language Queries

On this page