Leveraging OpenAI and MongoDB Atlas for Improved Search Functionality
Pavel Duchovny5 min read • Published Sep 18, 2024 • Updated Sep 18, 2024
FULL APPLICATION
Rate this tutorial
Search functionality is a critical component of many modern web applications. Providing users with relevant results based on their search queries and additional filters dramatically improves their experience and satisfaction with your app.
In this article, we'll go over an implementation of search functionality using OpenAI's GPT-4 model and MongoDB's
Atlas Vector search. We've created a request handler function that not only retrieves relevant data based on a user's search query but also applies additional filters provided by the user.
Enriching the existing documents data with embeddings is covered in our main Vector Search Tutorial.
Consider a real-world scenario where we have an Airbnb-like app. Users can perform a free text search for listings and also filter results based on certain criteria like the number of rooms, beds, or the capacity of people the property can accommodate.
To implement this functionality, we use MongoDB's full-text search capabilities for the primary search, and OpenAI's GPT-4 model to create embeddings that contain the semantics of the data and use Vector Search to find relevant results.
Our function is designed to act as a request handler for incoming search requests.
When a search request arrives, it first extracts the search terms and filters from the query parameters. If no search term is provided, it returns a random sample of 30 listings from the database.
If a search term is present, the function makes a POST request to OpenAI's API, sending the search term and asking for an embedded representation of it using a specific model. This request returns a list of “embeddings,” or vector representations of the search term, which is then used in the next step.
1 // This function is the endpoint's request handler. 2 // It interacts with MongoDB Atlas and OpenAI API for embedding and search functionality. 3 exports = async function({ query }, response) { 4 // Query params, e.g. '?search=test&beds=2' => {search: "test", beds: "2"} 5 const { search, beds, rooms, people, maxPrice, freeTextFilter } = query; 6 7 // MongoDB Atlas configuration. 8 const mongodb = context.services.get('mongodb-atlas'); 9 const db = mongodb.db('sample_airbnb'); // Replace with your database name. 10 const listingsAndReviews = db.collection('listingsAndReviews'); // Replace with your collection name. 11 12 // If there's no search query, return a sample of 30 random documents from the collection. 13 if (!search || search === "") { 14 return await listingsAndReviews.aggregate([{$sample: {size: 30}}]).toArray(); 15 } 16 17 // Fetch the OpenAI key stored in the context values. 18 const openai_key = context.values.get("openAIKey"); 19 20 // URL to make the request to the OpenAI API. 21 const url = 'https://api.openai.com/v1/embeddings'; 22 23 // Call OpenAI API to get the embeddings. 24 let resp = await context.http.post({ 25 url: url, 26 headers: { 27 'Authorization': [`Bearer ${openai_key}`], 28 'Content-Type': ['application/json'] 29 }, 30 body: JSON.stringify({ 31 input: search, 32 model: "text-embedding-ada-002" 33 }) 34 }); 35 36 // Parse the JSON response 37 let responseData = EJSON.parse(resp.body.text()); 38 39 // Check the response status. 40 if(resp.statusCode === 200) { 41 console.log("Successfully received embedding."); 42 43 // Fetch a random sample document. 44 45 46 const embedding = responseData.data[0].embedding; 47 console.log(JSON.stringify(embedding)) 48 49 let searchQ = { 50 "index": "default", 51 "queryVector": embedding, 52 "path": "doc_embedding", 53 "k": 100, 54 "numCandidates": 1000 55 } 56 57 // If there's any filter in the query parameters, add it to the search query. 58 if (freeTextFilter){ 59 // Turn free text search using GPT-4 into filter 60 const sampleDocs = await listingsAndReviews.aggregate([ 61 { $sample: { size: 1 }}, 62 { $project: { 63 _id: 0, 64 bedrooms: 1, 65 beds: 1, 66 room_type: 1, 67 property_type: 1, 68 price: 1, 69 accommodates: 1, 70 bathrooms: 1, 71 review_scores: 1 72 }} 73 ]).toArray(); 74 75 const filter = await context.functions.execute("getSearchAIFilter",sampleDocs[0],freeTextFilter ); 76 searchQ.filter = filter; 77 } 78 else if(beds || rooms) { 79 let filter = { "$and" : []} 80 81 if (beds) { 82 filter.$and.push({"beds" : {"$gte" : parseInt(beds) }}) 83 } 84 if (rooms) 85 { 86 filter.$and.push({"bedrooms" : {"$gte" : parseInt(rooms) }}) 87 } 88 searchQ.filter = filter; 89 } 90 91 // Perform the search with the defined query and limit the result to 50 documents. 92 let docs = await listingsAndReviews.aggregate([ 93 { "$vectorSearch": searchQ }, 94 { $limit : 50 } 95 ]).toArray(); 96 97 return docs; 98 } else { 99 console.error("Failed to get embeddings"); 100 return []; 101 } 102 };
To cover the filtering part of the query, we are using embedding and building a filter query to cover the basic filters that a user might request — in the presented example, two rooms and two beds in each.
1 else if(beds || rooms) { 2 let filter = { "$and" : []} 3 4 if (beds) { 5 filter.$and.push({"beds" : {"$gte" : parseInt(beds) }}) 6 } 7 if (rooms) 8 { 9 filter.$and.push({"bedrooms" : {"$gte" : parseInt(rooms) }}) 10 } 11 searchQ.filter = filter; 12 }
Let's consider a more advanced use case that can enhance our filtering experience. In this example, we are allowing a user to perform a free-form filtering that can provide sophisticated sentences, such as, “More than 1 bed and rating above 91.”
We call the OpenAI API to interpret the user's free text filter and translate it into something we can use in a MongoDB query. We send the API a description of what we need, based on the document structure we're working with and the user's free text input. This text is fed into the GPT-4 model, which returns a JSON object with 'range' or 'equals' operators that can be used in a MongoDB search query.
1 // This function is the endpoint's request handler. 2 // It interacts with OpenAI API for generating filter JSON based on the input. 3 exports = async function(sampleDoc, search) { 4 // URL to make the request to the OpenAI API. 5 const url = 'https://api.openai.com/v1/chat/completions'; 6 7 // Fetch the OpenAI key stored in the context values. 8 const openai_key = context.values.get("openAIKey"); 9 10 // Convert the sample document to string format. 11 let syntDocs = JSON.stringify(sampleDoc); 12 console.log(syntDocs); 13 14 // Prepare the request string for the OpenAI API. 15 const reqString = `Convert programmatic command to Atlas $search filter only for range and equals JS:\n\nExample: Based on document structure {"siblings" : '...', "dob" : "..."} give me the filter of all people born 2015 and siblings are 3 \nOutput: {"filter":{ "compound" : { "must" : [ {"range": {"gte": 2015, "lte" : 2015,"path": "dob"} },{"equals" : {"value" : 3 , path :"siblings"}}]}}} \n\n provide the needed filter to accomodate ${search}, pick a path from structure ${syntDocs}. Need just the json object with a range or equal operators. No explanation. No 'Output:' string in response. Valid JSON.`; 16 console.log(`reqString: ${reqString}`); 17 18 // Call OpenAI API to get the response. 19 let resp = await context.http.post({ 20 url: url, 21 headers: { 22 'Authorization': `Bearer ${openai_key}`, 23 'Content-Type': 'application/json' 24 }, 25 body: JSON.stringify({ 26 model: "gpt-4", 27 temperature: 0.1, 28 messages: [ 29 { 30 "role": "system", 31 "content": "Output filter json generator follow only provided rules" 32 }, 33 { 34 "role": "user", 35 "content": reqString 36 } 37 ] 38 }) 39 }); 40 41 // Parse the JSON response 42 let responseData = JSON.parse(resp.body.text()); 43 44 // Check the response status. 45 if(resp.statusCode === 200) { 46 console.log("Successfully received code."); 47 console.log(JSON.stringify(responseData)); 48 49 const code = responseData.choices[0].message.content; 50 let parsedCommand = EJSON.parse(code); 51 console.log('parsed' + JSON.stringify(parsedCommand)); 52 53 // If the filter exists and it's not an empty object, return it. 54 if (parsedCommand.filter && Object.keys(parsedCommand.filter).length !== 0) { 55 return parsedCommand.filter; 56 } 57 58 // If there's no valid filter, return an empty object. 59 return {}; 60 61 } else { 62 console.error("Failed to generate filter JSON."); 63 console.log(JSON.stringify(responseData)); 64 return {}; 65 } 66 };
The function then constructs a MongoDB search query using the embedded representation of the search term and any additional filters provided by the user. This query is sent to MongoDB, and the function returns the results as a response —something that looks like the following for a search of “New York high floor” and “More than 1 bed and rating above 91.”
1 {$vectorSearch:{ 2 "index": "default", 3 "queryVector": embedding, 4 "path": "doc_embedding", 5 "filter" : { "$and" : [{"beds": {"$gte" : 1}} , "score": {"$gte" : 91}}]}, 6 "k": 100, 7 "numCandidates": 1000 8 } 9 }
This approach allows us to leverage the power of OpenAI's GPT-4 model to interpret free text input and MongoDB's full-text search capability to return highly relevant search results. The use of natural language processing and AI brings a level of flexibility and intuitiveness to the search function that greatly enhances the user experience.
Remember, however, this is an advanced implementation. Ensure you have a good understanding of how MongoDB and OpenAI operate before attempting to implement a similar solution. Always take care to handle sensitive data appropriately and ensure your AI use aligns with OpenAI's use case policy.