Get Started with the LangChain JS/TS Integration
On this page
Note
This tutorial uses LangChain's JavaScript library. For a tutorial that uses the Python library, see Get Started with the LangChain Integration.
You can integrate Atlas Vector Search with LangChain to build LLM applications and implement retrieval-augmented generation (RAG). This tutorial demonstrates how to start using Atlas Vector Search with LangChain to perform semantic search on your data and build a RAG implementation. Specifically, you perform the following actions:
Set up the environment.
Store custom data on Atlas.
Create an Atlas Vector Search index on your data.
Run the following vector search queries:
Semantic search.
Semantic search with metadata pre-filtering.
Maximal Marginal Relevance (MMR) search.
Implement RAG by using Atlas Vector Search to answer questions on your data.
Background
LangChain is an open-source framework that simplifies the creation of LLM applications through the use of "chains." Chains are LangChain-specific components that can be combined for a variety of AI use cases, including RAG.
By integrating Atlas Vector Search with LangChain, you can use Atlas as a vector database and use Atlas Vector Search to implement RAG by retrieving semantically similar documents from your data. To learn more about RAG, see Retrieval-Augmented Generation (RAG) with Atlas Vector Search.
Prerequisites
To complete this tutorial, you must have the following:
An Atlas cluster running MongoDB version 6.0.11, 7.0.2, or later (including RCs).
An OpenAI API Key. You must have a paid OpenAI account with credits available for API requests.
A terminal and code editor to run your Node.js project.
npm and Node.js installed.
Set Up the Environment
You must first set up the environment for this tutorial. To set up your environment, complete the following steps.
Update your package.json
file.
Configure your project to use ES modules
by adding "type": "module"
to your package.json
file
and then saving it.
{ "type": "module", // other fields... }
Create a file named get-started.js
and paste the following code.
In your project, create a file named get-started.js
, and then copy and paste
the following code into the file. You will add code to this file throughout
the tutorial.
This initial code snippet imports required packages for this tutorial, defines environmental variables, and establishes a connection to your Atlas cluster.
import { formatDocumentsAsString } from "langchain/util/document"; import { MongoClient } from "mongodb"; import { MongoDBAtlasVectorSearch } from "@langchain/mongodb"; import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai"; import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf"; import { PromptTemplate } from "@langchain/core/prompts"; import { RecursiveCharacterTextSplitter } from "langchain/text_splitter"; import { RunnableSequence, RunnablePassthrough } from "@langchain/core/runnables"; import { StringOutputParser } from "@langchain/core/output_parsers"; import * as fs from 'fs'; process.env.OPENAI_API_KEY = "<api-key>"; process.env.ATLAS_CONNECTION_STRING = "<connection-string>"; const client = new MongoClient(process.env.ATLAS_CONNECTION_STRING);
Replace the placeholder values.
To finish setting up the environment, replace the <api-key>
and <connection-string>
placeholder values in get-started.js
with your OpenAI API Key and the SRV connection string
for your Atlas cluster. Your connection string should use
the following format:
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
Use Atlas as a Vector Store
In this section, you define an asynchronous function
to load custom data into Atlas and instantiate Atlas as a vector database,
also called a vector store.
Add the following code into your get-started.js
file.
Note
For this tutorial, you use a publicly accessible PDF document titled MongoDB Atlas Best Practices as the data source for your vector store. This document describes various recommendations and core concepts for managing your Atlas deployments.
This code performs the following actions:
Configures your Atlas collection by specifying the following parameters:
langchain_db.test
as the Atlas collection to store the documents.vector_index
as the index to use for querying the vector store.text
as the name of the field containing the raw text content.embedding
as the name of the field containing the vector embeddings.
Prepares your custom data by doing the following:
Retrieves raw data from the specified URL and saves it as PDF.
Uses a text splitter to split the data into smaller documents.
Specifies chunk parameters, which determines the number of characters in each document and the number of characters that should overlap between two consecutive documents.
Creates a vector store from the sample documents by calling the
MongoDBAtlasVectorSearch.fromDocuments
method. This method specifies the following parameters:The sample documents to store in the vector database.
OpenAI's embedding model as the model used to convert text into vector embeddings for the
embedding
field.Your Atlas configuration.
async function run() { try { // Configure your Atlas collection const database = client.db("langchain_db"); const collection = database.collection("test"); const dbConfig = { collection: collection, indexName: "vector_index", // The name of the Atlas search index to use. textKey: "text", // Field name for the raw text content. Defaults to "text". embeddingKey: "embedding", // Field name for the vector embeddings. Defaults to "embedding". }; // Ensure that the collection is empty const count = await collection.countDocuments(); if (count > 0) { await collection.deleteMany({}); } // Save online PDF as a file const rawData = await fetch("https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4HkJP"); const pdfBuffer = await rawData.arrayBuffer(); const pdfData = Buffer.from(pdfBuffer); fs.writeFileSync("atlas_best_practices.pdf", pdfData); // Load and split the sample data const loader = new PDFLoader(`atlas_best_practices.pdf`); const data = await loader.load(); const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 200, chunkOverlap: 20, }); const docs = await textSplitter.splitDocuments(data); // Instantiate Atlas as a vector store const vectorStore = await MongoDBAtlasVectorSearch.fromDocuments(docs, new OpenAIEmbeddings(), dbConfig); } finally { // Ensure that the client will close when you finish/error await client.close(); } } run().catch(console.dir);
Save the file, then run the following command to load your data into Atlas.
node get-started.js
Tip
After running get-started.js
, you can
view your vector embeddings in the Atlas UI
by navigating to the langchain_db.test
collection in your cluster.
Create the Atlas Vector Search Index
Note
To create an Atlas Vector Search index, you must have Project Data Access Admin
or higher access to the Atlas project.
To enable vector search queries on your vector store,
create an Atlas Vector Search index on the langchain_db.test
collection.
Add the following code to the asynchronous function that you defined in your get-started.js
file.
This code creates an index of the vectorSearch type that specifies indexing the following fields:
embedding
field as the vector type. Theembedding
field contains the embeddings created using OpenAI'stext-embedding-ada-002
embedding model. The index definition specifies1536
vector dimensions and measures similarity usingcosine
.loc.pageNumber
field as the filter type for pre-filtering data by the page number in the PDF.
This code also uses an await function to ensure that your search index has synced to your data before it's used.
1 // Ensure index does not already exist, then create your Atlas Vector Search index 2 const indexes = await collection.listSearchIndexes("vector_index").toArray(); 3 if(indexes.length === 0){ 4 5 // Define your Atlas Vector Search Index 6 const index = { 7 name: "vector_index", 8 type: "vectorSearch", 9 definition: { 10 "fields": [ 11 { 12 "type": "vector", 13 "numDimensions": 1536, 14 "path": "embedding", 15 "similarity": "cosine" 16 }, 17 { 18 "type": "filter", 19 "path": "loc.pageNumber" 20 } 21 ] 22 } 23 } 24 25 // Run the helper method 26 const result = await collection.createSearchIndex(index); 27 console.log(result); 28 29 // Wait for Atlas to sync index 30 console.log("Waiting for initial sync..."); 31 await new Promise(resolve => setTimeout(() => { 32 resolve(); 33 }, 10000)); 34 }
Save the file, then run the following command to create you Atlas Vector Search index.
node get-started.js
Run Vector Search Queries
This section demonstrates various queries that you can run on your vectorized data. Now that you've created the index, add the following code to your asynchronous function to run vector search queries against your data.
Note
If you experience inaccurate results when querying
your data, your index might be taking longer
than expected to sync. Increase the number in the setTimeout
function to allow more time for the initial sync.
Add the following code to your asynchronous function and save the file.
The following code uses the similaritySearch
method
to perform a basic semantic search for
the string MongoDB Atlas security
. It returns a
list of documents ranked by relevance with only the
pageContent
and pageNumber
fields.
// Basic semantic search const basicOutput = await vectorStore.similaritySearch("MongoDB Atlas security"); const basicResults = basicOutput.map((results => ({ pageContent: results.pageContent, pageNumber: results.metadata.loc.pageNumber, }))) console.log("Semantic Search Results:") console.log(basicResults)
Run the following command to execute the query.
node get-started.js
... Semantic Search Results: [ { pageContent: 'MongoDB Atlas features extensive capabilities to defend,\n' + 'detect, and control access to MongoDB, offering among\n' + 'the most complete security controls of any modern\n' + 'database:', pageNumber: 18 }, { pageContent: 'Atlas provides encryption of data at rest with encrypted\n' + 'storage volumes.\n' + 'Optionally, Atlas users can configure an additional layer of\n' + 'encryption on their data at rest using the MongoDB', pageNumber: 19 }, { pageContent: 'automatically enabled.\n' + 'Review thesecurity section of the MongoDB Atlas\n' + 'documentationto learn more about each of the security\n' + 'features discussed below.\n' + 'IP Whitelisting', pageNumber: 18 }, { pageContent: '16Security\n' + '17Business Intelligence with MongoDB Atlas\n' + '18Considerations for Proofs of Concept\n' + '18MongoDB Stitch: Serverless Platform from MongoDB\n' + '19We Can Help\n' + '19Resources', pageNumber: 2 } ]
You can pre-filter your data by using an
MQL match expression
that compares the indexed field with boolean, number, or
string values. You must index any metadata fields that you want to
filter by as the filter
type. To learn more, see
How to Index Fields for Vector Search.
Note
You specified the loc.pageNumber
field as a filter
when you created the index
for this tutorial.
Add the following code to your asynchronous function and save the file.
The following code uses the similaritySearch
method
to perform a semantic search for the string MongoDB Atlas security
.
It specifies the following parameters:
The number of documents to return as
3
.A pre-filter on the
loc.pageNumber
field that uses the$eq
operator to match documents appearing on page 17 only.
It returns a list of documents ranked by relevance with only the
pageContent
and pageNumber
fields.
// Semantic search with metadata filter const filteredOutput = await vectorStore.similaritySearch("MongoDB Atlas security", 3, { preFilter: { "loc.pageNumber": {"$eq": 17 }, } }); const filteredResults = filteredOutput.map((results => ({ pageContent: results.pageContent, pageNumber: results.metadata.loc.pageNumber, }))) console.log("Semantic Search with Filtering Results:") console.log(filteredResults)
Run the following command to execute the query.
node get-started.js
... Semantic Search with Filter Results: [ { pageContent: 'BSON database dumps produced bymongodump.\n' + 'In the vast majority of cases, MongoDB Atlas backups\n' + 'delivers the simplest, safest, and most efficient backup', pageNumber: 17 }, { pageContent: 'Monitoring Solutions\n' + 'The MongoDB Atlas API provides integration with external\n' + 'management frameworks through programmatic access to\n' + 'automation features and alerts.\n' + 'APM Integration', pageNumber: 17 }, { pageContent: 'MongoDB Atlas backups are maintained continuously, just\n' + 'a few seconds behind the operational system. If the\n' + 'MongoDB cluster experiences a failure, the most recent', pageNumber: 17 } ]
You can also perform semantic search based on Max Marginal Relevance (MMR), a measure of semantic relevance optimized for diversity.
Add the following code to your asynchronous function and save the file.
The following code uses the maxMarginalRelevanceSearch
method
to search for the string MongoDB Atlas security
.
It also specifies an object that defines the following
optional parameters:
k
to limit the number of returned documents to3
.fetchK
to fetch only10
documents before passing the documents to the MMR algorithm.
It returns a list of documents ranked by relevance
with only the pageContent
and pageNumber
fields.
// Max Marginal Relevance search const mmrOutput = await vectorStore.maxMarginalRelevanceSearch("MongoDB Atlas security", { k: 3, fetchK: 10, }); const mmrResults = mmrOutput.map((results => ({ pageContent: results.pageContent, pageNumber: results.metadata.loc.pageNumber, }))) console.log("Max Marginal Relevance Search Results:") console.log(mmrResults)
Run the following command to execute the query.
node get-started.js
... Max Marginal Relevance Search Results: [ { pageContent: 'MongoDB Atlas features extensive capabilities to defend,\n' + 'detect, and control access to MongoDB, offering among\n' + 'the most complete security controls of any modern\n' + 'database:', pageNumber: 18 }, { pageContent: 'automatically enabled.\n' + 'Review thesecurity section of the MongoDB Atlas\n' + 'documentationto learn more about each of the security\n' + 'features discussed below.\n' + 'IP Whitelisting', pageNumber: 18 }, { pageContent: '16Security\n' + '17Business Intelligence with MongoDB Atlas\n' + '18Considerations for Proofs of Concept\n' + '18MongoDB Stitch: Serverless Platform from MongoDB\n' + '19We Can Help\n' + '19Resources', pageNumber: 2 } ]
Answer Questions on Your Data
This section demonstrates two different RAG implementations using Atlas Vector Search and LangChain. Now that you've used Atlas Vector Search to retrieve semantically similar documents, use the following code examples to prompt the LLM to answer questions against the documents returned by Atlas Vector Search.
Add the following code to your asynchronous function and save the file.
This code does the following:
Instantiates Atlas Vector Search as a retriever to query for semantically similar documents.
Defines a LangChain prompt template to instruct the LLM to use these documents as context for your query. LangChain passes these documents to the
{context}
input variable and your query to the{question}
variable.Constructs a chain that uses OpenAI's chat model to generate context-aware responses based on your prompt.
Prompts the chain with a sample query about Atlas security recommendations.
Returns the LLM's response and the documents used as context.
// Implement RAG to answer questions on your data const retriever = vectorStore.asRetriever(); const prompt = PromptTemplate.fromTemplate(`Answer the question based on the following context: {context} Question: {question}`); const model = new ChatOpenAI({}); const chain = RunnableSequence.from([ { context: retriever.pipe(formatDocumentsAsString), question: new RunnablePassthrough(), }, prompt, model, new StringOutputParser(), ]); // Prompt the LLM const question = "How can I secure my MongoDB Atlas cluster?"; const answer = await chain.invoke(question); console.log("Question: " + question); console.log("Answer: " + answer); // Return source documents const retrievedResults = await retriever.getRelevantDocuments(question) const documents = retrievedResults.map((documents => ({ pageContent: documents.pageContent, pageNumber: documents.metadata.loc.pageNumber, }))) console.log("\nSource documents:\n" + JSON.stringify(documents, 1, 2))
Run the following command to execute your file.
After you save the file, run the following command. The generated response might vary.
node get-started.js
... Question: How can I secure my MongoDB Atlas cluster? Answer: You can secure your MongoDB Atlas cluster by taking advantage of extensive capabilities to defend, detect, and control access to MongoDB. You can also enable encryption of data at rest with encrypted storage volumes and configure an additional layer of encryption on your data. Additionally, you can set up global clusters on Amazon Web Services, Microsoft Azure, and Google Cloud Platform with just a few clicks in the MongoDB Atlas UI. Source documents: [ { "pageContent": "MongoDB Atlas features extensive capabilities to defend,\ndetect, and control access to MongoDB, offering among\nthe most complete security controls of any modern\ndatabase:", "pageNumber": 18 }, { "pageContent": "throughput is required, it is recommended to either\nupgrade the Atlas cluster or take advantage of MongoDB's\nauto-shardingto distribute read operations across multiple\nprimary members.", "pageNumber": 14 }, { "pageContent": "Atlas provides encryption of data at rest with encrypted\nstorage volumes.\nOptionally, Atlas users can configure an additional layer of\nencryption on their data at rest using the MongoDB", "pageNumber": 19 }, { "pageContent": "You can set up global clusters — available on Amazon Web\nServices, Microsoft Azure, and Google Cloud Platform —\nwith just a few clicks in the MongoDB Atlas UI. MongoDB", "pageNumber": 13 } ]
Add the following code to your asynchronous function and save the file.
This code does the following:
Instantiates Atlas Vector Search as a retriever to query for semantically similar documents. It also specifies the following optional parameters:
searchType
asmmr
, which specifies that Atlas Vector Search retrieves documents based on Max Marginal Relevance (MMR).filter
to add a pre-filter on thelog.pageNumbers
field to include documents that appear on page 17 only.The following MMR-specific parameters:
fetchK
to fetch only20
documents before passing the documents to the MMR algorithm.lambda
, a value between0
and1
to determine the degree of diversity among the results, with0
representing maximum diversity and1
representing minimum diversity.
Defines a LangChain prompt template to instruct the LLM to use these documents as context for your query. LangChain passes these documents to the
{context}
input variable and your query to the{question}
variable.Constructs a chain that uses OpenAI's chat model to generate context-aware responses based on your prompt.
Prompts the chain with a sample query about Atlas security recommendations.
Returns the LLM's response and the documents used as context.
// Implement RAG to answer questions on your data const retriever = await vectorStore.asRetriever({ searchType: "mmr", // Defaults to "similarity filter: { preFilter: { "loc.pageNumber": { "$eq": 17 } } }, searchKwargs: { fetchK: 20, lambda: 0.1, }, }); const prompt = PromptTemplate.fromTemplate(`Answer the question based on the following context: {context} Question: {question}`); const model = new ChatOpenAI({}); const chain = RunnableSequence.from([ { context: retriever.pipe(formatDocumentsAsString), question: new RunnablePassthrough(), }, prompt, model, new StringOutputParser(), ]); // Prompt the LLM const question = "How can I secure my MongoDB Atlas cluster?"; const answer = await chain.invoke(question); console.log("Question: " + question); console.log("Answer: " + answer); // Return source documents const retrievedResults = await retriever.getRelevantDocuments(question) const documents = retrievedResults.map((documents => ({ pageContent: documents.pageContent, pageNumber: documents.metadata.loc.pageNumber, }))) console.log("\nSource documents:\n" + JSON.stringify(documents, 1, 2))
Run the following command to execute your file.
After you save the file, run the following command. The generated response might vary.
node get-started.js
... Question: How can I secure my MongoDB Atlas cluster? Answer: To secure your MongoDB Atlas cluster, you can take the following measures: 1. Enable authentication and use strong, unique passwords for all users. 2. Utilize encryption in transit and at rest to protect data both while in motion and at rest. 3. Configure network security by whitelisting IP addresses that can access your cluster. 4. Enable role-based access control to limit what actions users can perform within the cluster. 5. Monitor and audit your cluster for suspicious activity using logging and alerting features. 6. Keep your cluster up to date with the latest patches and updates to prevent vulnerabilities. 7. Implement backups and disaster recovery plans to ensure you can recover your data in case of data loss. Source documents: [ { "pageContent": "BSON database dumps produced bymongodump.\nIn the vast majority of cases, MongoDB Atlas backups\ndelivers the simplest, safest, and most efficient backup", "pageNumber": 17 }, { "pageContent": "APM Integration\nMany operations teams use Application Performance\nMonitoring (APM) platforms to gain global oversight of\n15", "pageNumber": 17 }, { "pageContent": "performance SLA.\nIf in the course of a deployment it is determined that a new\nshard key should be used, it will be necessary to reload the\ndata with a new shard key because designation and values", "pageNumber": 17 }, { "pageContent": "to the database.\nReplication Lag\nReplication lag is the amount of time it takes a write\noperation on the primary replica set member to replicate to", "pageNumber": 17 } ]
Next Steps
MongoDB also provides the following developer resources: