Voyage AI joins MongoDB to power more accurate and trustworthy AI applications on Atlas.

Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

MongoDB Developer
MongoDB
plus
Sign in to follow topics
MongoDB Developer Center
chevron-right
Developer Topics
chevron-right
Products
chevron-right
MongoDB
chevron-right

Exploring the Advanced Search Capabilities With MongoDB Atlas Search

Aasawari Sahasrabuddhe6 min read • Published Aug 20, 2024 • Updated Aug 20, 2024
SpringMongoDBJava
FULL APPLICATION
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
If you've been following along with the first two parts of our Atlas Search with Spring Boot series, you should already understand Atlas Search concepts and how to craft queries using them. In this article, we'll get into some more advanced topics and features of Atlas Search, enhancing your ability to query data effectively.
For those who still need to explore the earlier parts of the series, I strongly recommend doing so to build a foundational understanding. You can access the previous articles here: Getting Started With MongoDB Atlas Search and Java and Exploring Search Capabilities With Atlas Search.
In this article, we'll explore advanced topics such as creating custom analyzers, utilizing language-specific analyzers, working with GeoJSON data, and performing searches in combination with other aggregation stages like $lookup.
Let’s get into it!

Prerequisites

  1. Sign up for a MongoDB free tier Atlas account. You can start by following Register to Free Tier to build your Atlas cluster.
  2. Use Import and Export Data to add the data to the Atlas cluster. The sample data is available in the GitHub repository at productDetails.json and productReviews.json.
  3. Install Java version 17 or above.

Exploring the advanced search capabilities

Case 1: Searching with language analyzers

In the previous tutorial, we learned the use of English language analyzers. However, Atlas Search offers a wide range of language analyzers beyond just English. For comprehensive details, you can refer to the official Language Analyzers documentation.
After loading the data into your Atlas cluster from productDetails.json, you'll need to create a search index. You can do this using any of the methods outlined in Part 1 of our series, Getting Started With MongoDB Atlas Search and Java.
To create your search index, you can make use of the below JSON and create an index using the JSON editor on the Atlas UI.
1{
2 "analyzer": "lucene.spanish",
3 "searchAnalyzer": "lucene.spanish",
4 "mappings": {
5 "dynamic": true,
6 "fields": {
7 "productNameSpanish": {
8 "analyzer": "lucene.spanish",
9 "searchAnalyzer": "lucene.spanish",
10 "type": "string"
11 }
12 }
13 }
14}
Once the index is created, you can make use of the below Java code:
1public List<Document> searchWithSpanish(String query, String sampleProducts) {
2 MongoCollection<Document> collection;
3 MongoDatabase database = mongoTemplate.getDb();
4 collection = database.getCollection(sampleProducts);
5 List<Document> pipeline = Arrays.asList(new Document("$search",
6 new Document("index", "index01")
7 .append("text",
8 new Document("query", query)
9 .append("path", "productNameSpanish"))),
10 new Document("$limit", 3));
11
12 ArrayList<Document> results = new ArrayList<>();
13 collection.aggregate(pipeline).into(results);
14 return results;
15}
In this tutorial, since we use two different collections, we pass them as parameters in the search methods.
To run this, you can run the API call as:
1GET http://localhost:8080/search/withSpanish?query={Raton Inalambrico}
and it would give you the output as
1[
2 {
3 "_id": {
4 "timestamp": 1723024731,
5 "date": "2024-08-07T09:58:51.000+00:00"
6 },
7 "productID": "ELEC-00123",
8 "productName": "Wireless Mouse",
9 "productNameSpanish": "Ratón Inalámbrico",
10 "category": "Electronics",
11 "price": 25.99,
12 "availability": "In Stock",
13 "productLink": "https://www.amazon.com/dp/B08J2T6P7Q",
14 "location": {
15 "city": "Shenzhen",
16 "coordinates": [
17 114.0579,
18 22.5431
19 ]
20 }
21 },
22 {
23 "_id": {
24 "timestamp": 1723024731,
25 "date": "2024-08-07T09:58:51.000+00:00"
26 },
27 "productID": "CHRG-11111",
28 "productName": "Wireless Charger",
29 "productNameSpanish": "Cargador Inalámbrico",
30 "category": "Accessories",
31 "price": 24.99,
32 "availability": "In Stock",
33 "productLink": "https://www.target.com/p/wireless-charger/-/A-87654321",
34 "location": {
35 "city": "Shenzhen",
36 "coordinates": [
37 114.0579,
38 22.5431
39 ]
40 }
41 }
42]

Case 2: Building custom analyzers

In previous tutorials, we explored Atlas's pre-defined analyzers and their use cases. In addition to these, Atlas Search also allows you to create custom analyzers, which can be particularly useful when you need to search for data like product IDs containing alphanumeric characters and hyphens or email addresses.
In this tutorial, we'll build custom analyzers for two different fields and demonstrate how to perform searches using the indexes we create.
For example, in the sample data you've imported, the _productDetails_ field contains a _productID_ with a value like **ELEC-00123**. To effectively search within this field, we would configure the index as follows:
1{
2 "mappings": {
3 "dynamic": true
4 },
5 "analyzers": [
6 {
7 "name": "productIDAnalyzer",
8 "tokenFilters": [
9 {
10 "type": "lowercase"
11 }
12 ],
13 "tokenizer": {
14 "pattern": "[-.]+",
15 "type": "regexSplit"
16 }
17 },
18 {
19 "name": "productLinkAnalyzer",
20 "tokenFilters": [
21 {
22 "type": "lowercase"
23 },
24 {
25 "originalTokens": "include",
26 "type": "asciiFolding"
27 }
28 ],
29 "tokenizer": {
30 "pattern": "\\W+",
31 "type": "regexSplit"
32 }
33 }
34 ]
35}
To search for the productID, we can specify only the numeric or the alphabets to perform the search. The Spring Boot method to perform the search would look like this:
1public List<Document> withCustomAnalyzersOnProductID(String query, String sampleProducts) {
2 MongoCollection<Document> collection;
3 MongoDatabase database = mongoTemplate.getDb();
4 collection = database.getCollection(sampleProducts);
5 List<Document> pipeline = Arrays.asList(new Document("$search",
6 new Document("index", "index02")
7 .append("text",
8 new Document("query", query)
9 .append("path", "productID"))),
10 new Document("$project",
11 new Document("_id", 0L)
12 .append("productID", 1L)
13 .append("productName", 1L)),
14 new Document("$limit", 3));
15
16 ArrayList<Document> results = new ArrayList<>();
17 collection.aggregate(pipeline).into(results);
18 return results;
19 }
The REST call would look like this:
1GET http://localhost:8080/search/withCustomAnalyzersOnProductID?query={05050}
and it would give you results as:
1[
2 {
3 "productID": "WEAR-05050",
4 "productName": "Smartwatch"
5 }
6]
In the above index, we also have an analyzer created on the productLink field which uses the regex on the links being used. This regex would remove the “//” from the text and allow you to search without using these special characters.
For example, if you wish to search for all products listed on www.amazon.com, the custom analyzer would help you search for the products.
For instance, if we have the search function as:
1public List<Document> withCustomAnalyzersOnProductLink(String query, String sampleProducts) {
2 MongoCollection<Document> collection;
3 MongoDatabase database = mongoTemplate.getDb();
4 collection = database.getCollection(sampleProducts);
5 List<Document> pipeline = Arrays.asList(new Document("$search",
6 new Document("index", "index02")
7 .append("text",
8 new Document("query", query)
9 .append("path", "productLink"))),
10 new Document("$project",
11 new Document("_id", 0L)
12 .append("productID", 1L)
13 .append("productName", 1L)
14 .append("productLink", 1L)),
15 new Document("$limit", 3));
16
17 ArrayList<Document> results = new ArrayList<>();
18 collection.aggregate(pipeline).into(results);
19 return results;
20}
We can use the REST call as:
1GET http://localhost:8080/search/withCustomAnalyzerOnProductLink?query={www.amazon.com}
and it would result as:
1[
2 {
3 "productID": "ELEC-00123",
4 "productName": "Wireless Mouse",
5 "productLink": "https://www.amazon.com/dp/B08J2T6P7Q"
6 },
7 {
8 "productID": "MONI-03030",
9 "productName": "4K Monitor",
10 "productLink": "https://www.amazon.com/dp/B09ABCDEFG"
11 },
12 {
13 "productID": "ACC-08080",
14 "productName": "Smartphone Stand",
15 "productLink": "https://www.amazon.com/dp/B08ABC1234"
16 }
17]

Case 3: Performing a search with lookups

While using the $search operator, it is always expected to use it as the first stage in the pipeline. But if you get into a situation where you need to perform join first and then search, what would you do? Well, this case covers the concept!
When you imported the data into the Atlas cluster, you would also have imported the productReviews.json which contains reviews for the products.
To understand further, let's create a simple index on the productReviews collection:
1{
2 "mappings": {
3 "dynamic": true
4 }
5}
And let's suppose you wish to get all the product information from both collections which have reviews as _excellent. _
The method would look like:
1public List<Document> searchWithLookups(String query, String productReviews) {
2 MongoCollection<Document> collection;
3 MongoDatabase database = mongoTemplate.getDb();
4 collection = database.getCollection(productReviews);
5 List<Document> pipeline = Arrays.asList(new Document("$lookup",
6 new Document("from", productReviews)
7 .append("localField", "productID")
8 .append("foreignField", "productID")
9 .append("as", "result")
10 .append("pipeline", Arrays.asList(new Document("$search",
11 new Document("index", "index03")
12 .append("text",
13 new Document("query", query)
14 .append("path", "review")))))),
15 new Document("$match",
16 new Document("result",
17 new Document("$ne", Arrays.asList()))),
18 new Document("$limit", 3));
19
20 ArrayList<Document> results = new ArrayList<>();
21 collection.aggregate(pipeline).into(results);
22 return results;
23}
The REST API call would look like:
1GET http://localhost:8080/search/withLookups?query={excellent}
and it would result in
1[
2{
3 "_id": {
4 "timestamp": 2008237403,
5 "date": "2033-08-21T11:43:23.000+00:00"
6 },
7 "productID": "ELEC-00123",
8 "review": "The Wireless Mouse is excellent with a long-lasting battery, perfect for daily use.",
9 "rating": 4.5,
10 "reviewer": "Alice Walker",
11 "reviewDate": "2024-08-01T10:00:00.000+00:00",
12 "result": [
13 {
14 "_id": {
15 "timestamp": 2008237403,
16 "date": "2033-08-21T11:43:23.000+00:00"
17 },
18 "productID": "ELEC-00123",
19 "review": "The Wireless Mouse is excellent with a long-lasting battery, perfect for daily use.",
20 "rating": 4.5,
21 "reviewer": "Alice Walker",
22 "reviewDate": "2024-08-01T10:00:00.000+00:00"
23 }
24 ]
25 },
26 {
27 "_id": {
28 "timestamp": 2008237403,
29 "date": "2033-08-21T11:43:23.000+00:00"
30 },
31 "productID": "HEAD-07070",
32 "review": "Noise Cancelling Headphones offer excellent sound isolation, though availability is limited.",
33 "rating": 4.5,
34 "reviewer": "Jack Taylor",
35 "reviewDate": "2024-08-10T19:00:00.000+00:00",
36 "result": [
37 {
38 "_id": {
39 "timestamp": 2008237403,
40 "date": "2033-08-21T11:43:23.000+00:00"
41 },
42 "productID": "HEAD-07070",
43 "review": "Noise Cancelling Headphones offer excellent sound isolation, though availability is limited.",
44 "rating": 4.5,
45 "reviewer": "Jack Taylor",
46 "reviewDate": "2024-08-10T19:00:00.000+00:00"
47 }
48 ]
49 },
50 {
51 "_id": {
52 "timestamp": 2008237403,
53 "date": "2033-08-21T11:43:23.000+00:00"
54 },
55 "productID": "MONI-03030",
56 "review": "The 4K Monitor delivers stunning visuals with excellent clarity, though it is on the pricier side.",
57 "rating": 4.7,
58 "reviewer": "Frank Harris",
59 "reviewDate": "2024-08-06T11:20:00.000+00:00",
60 "result": [
61 {
62 "_id": {
63 "timestamp": 2008237403,
64 "date": "2033-08-21T11:43:23.000+00:00"
65 },
66 "productID": "MONI-03030",
67 "review": "The 4K Monitor delivers stunning visuals with excellent clarity, though it is on the pricier side.",
68 "rating": 4.7,
69 "reviewer": "Frank Harris",
70 "reviewDate": "2024-08-06T11:20:00.000+00:00"
71 }
72 ]
73 }
74]
This would give all product information with excellent keywords in the reviews.

Conclusion

In this final part of our Atlas Search with Spring Boot series, we've learned more advanced features that Atlas Search offers, such as custom analyzers and language-specific search capabilities. By exploring these topics, we have expanded the ability to design powerful and efficient search queries that cater to complex data retrieval needs.
We discussed custom analyzers to fine-tune search results and perform searches combined with other MongoDB aggregation stages, such as $lookup, to enrich your queries with related data from multiple collections. These techniques will help you create more nuanced and effective search solutions tailored to your specific application requirements.
If you have any questions about the articles, please follow the community forum thread below for questions and discussion. Also, keep exploring other tutorials on our MongoDB Developer Center.
Happy coding!
Top Comments in Forums
There are no comments on this article yet.
Start the Conversation

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
This is part of a series
Atlas Search with Spring Boot
More in this series
  • Getting started with MongoDB Atlas Search and Java
  • Exploring Search Capabilities With Atlas Search
Related
Quickstart

Creating, Reading, Updating, and Deleting MongoDB Documents With PHP


Sep 11, 2024 | 8 min read
Tutorial

Handle Time Series Data with MongoDB


Nov 19, 2024 | 13 min read
Tutorial

Preparing Time Series Data for Analysis Tools With $densify and $fill


Sep 17, 2024 | 8 min read
Article

Using AWS Rekognition to Analyse and Tag Uploaded Images


Sep 11, 2024 | 1 min read
Table of Contents
  • Prerequisites