The full-text search in MongoDB 4.4.9 becomes significantly slower when sorting conditions are included

Hello everyone,

I faced an issue with MongoDB’s full-text search and I’m seeking assistance.

Since the data I store is in Chinese and MongoDB lacks support for Chinese word segmentation, I’ve addressed this by creating a new field named Terms. I manually segmented the Chinese words and stored them in the Terms field, separated by spaces. Subsequently, I established a full-text index on this field.

This approach provides Chinese language support and guarantees efficient searching. Upon executing the following command, the response time is a mere 0.008 seconds, indicating exceptional speed.

db.bidding_data.find({
    $text: {
        $language: "none",
        $search: "公司 有限公司 投标 招标"
    },
    $and: [{
        publishTime: {
            $gte: ISODate("2020-01-17T00:00:00Z"),
            $lte: ISODate("2024-01-01T23:59:59Z")
        }
    }]
}, {
    publishTime: 1,
    city: 1,
    biddingType: 1,
    downloadUrl: 1,
    industry: 1,
    updateTime: 1,
    title: 1,
    tags: 1,
    sourceUrl: 1,
    propertiesInfo: 1,
    province: 1,
    createTime: 1,
    status: 1
}).skip(10)
.limit(10)

However, when I add sorting conditions, the query response becomes significantly slower, particularly when sorting based on scores or other fields. In my test environment with only 170,000 data, the response takes 6 seconds. The impact will be even greater with 5 million data in my production environment.

db.bidding_data.find({
    $text: {
        $language: "none",
        $search: "公司 有限公司 投标 招标"
    },
    $and: [{
        publishTime: {
            $gte: ISODate("2020-01-17T00:00:00Z"),
            $lte: ISODate("2024-01-01T23:59:59Z")
        }
    }]
}, {
    publishTime: 1,
    city: 1,
    biddingType: 1,
    downloadUrl: 1,
    industry: 1,
    updateTime: 1,
    title: 1,
    tags: 1,
    sourceUrl: 1,
    propertiesInfo: 1,
    province: 1,
    createTime: 1,
    status: 1,
		score: {
        $meta: "textScore"
    }
}).sort({
    score: {
        $meta: "textScore"
    },
    publishTime: -1
}).skip(10)
.limit(10)

Here is the plan for executing this statement.

// 1
{
    "queryPlanner": {
        "plannerVersion": NumberInt("1"),
        "namespace": "bidding_data.bidding_data",
        "indexFilterSet": false,
        "parsedQuery": {
            "$and": [
                {
                    "publishTime": {
                        "$lte": ISODate("2024-01-01T23:59:59.000Z")
                    }
                },
                {
                    "publishTime": {
                        "$gte": ISODate("2020-01-17T00:00:00.000Z")
                    }
                },
                {
                    "$text": {
                        "$search": "公司 有限公司 投标 招标",
                        "$language": "none",
                        "$caseSensitive": false,
                        "$diacriticSensitive": false
                    }
                }
            ]
        },
        "winningPlan": {
            "stage": "PROJECTION_DEFAULT",
            "transformBy": {
                "publishTime": 1,
                "city": 1,
                "biddingType": 1,
                "downloadUrl": 1,
                "industry": 1,
                "updateTime": 1,
                "title": 1,
                "tags": 1,
                "sourceUrl": 1,
                "propertiesInfo": 1,
                "province": 1,
                "createTime": 1,
                "status": 1,
                "score": {
                    "$meta": "textScore"
                }
            },
            "inputStage": {
                "stage": "SKIP",
                "skipAmount": NumberInt("0"),
                "inputStage": {
                    "stage": "SORT",
                    "sortPattern": {
                        "$computed0": {
                            "$meta": "textScore"
                        },
                        "publishTime": NumberInt("-1")
                    },
                    "memLimit": NumberInt("104857600"),
                    "limitAmount": NumberInt("20"),
                    "type": "default",
                    "inputStage": {
                        "stage": "TEXT",
                        "indexPrefix": { },
                        "indexName": "full_index",
                        "parsedTextQuery": {
                            "terms": [
                                "公司",
                                "投标",
                                "招标",
                                "有限公司"
                            ],
                            "negatedTerms": [ ],
                            "phrases": [ ],
                            "negatedPhrases": [ ]
                        },
                        "textIndexVersion": NumberInt("3"),
                        "inputStage": {
                            "stage": "TEXT_MATCH",
                            "inputStage": {
                                "stage": "TEXT_OR",
                                "filter": {
                                    "$and": [
                                        {
                                            "publishTime": {
                                                "$lte": ISODate("2024-01-01T23:59:59.000Z")
                                            }
                                        },
                                        {
                                            "publishTime": {
                                                "$gte": ISODate("2020-01-17T00:00:00.000Z")
                                            }
                                        }
                                    ]
                                },
                                "inputStages": [
                                    {
                                        "stage": "IXSCAN",
                                        "keyPattern": {
                                            "_fts": "text",
                                            "_ftsx": NumberInt("1"),
                                            "publishTime": NumberInt("-1")
                                        },
                                        "indexName": "full_index",
                                        "isMultiKey": true,
                                        "isUnique": false,
                                        "isSparse": false,
                                        "isPartial": false,
                                        "indexVersion": NumberInt("2"),
                                        "direction": "backward",
                                        "indexBounds": { }
                                    },
                                    {
                                        "stage": "IXSCAN",
                                        "keyPattern": {
                                            "_fts": "text",
                                            "_ftsx": NumberInt("1"),
                                            "publishTime": NumberInt("-1")
                                        },
                                        "indexName": "full_index",
                                        "isMultiKey": true,
                                        "isUnique": false,
                                        "isSparse": false,
                                        "isPartial": false,
                                        "indexVersion": NumberInt("2"),
                                        "direction": "backward",
                                        "indexBounds": { }
                                    },
                                    {
                                        "stage": "IXSCAN",
                                        "keyPattern": {
                                            "_fts": "text",
                                            "_ftsx": NumberInt("1"),
                                            "publishTime": NumberInt("-1")
                                        },
                                        "indexName": "full_index",
                                        "isMultiKey": true,
                                        "isUnique": false,
                                        "isSparse": false,
                                        "isPartial": false,
                                        "indexVersion": NumberInt("2"),
                                        "direction": "backward",
                                        "indexBounds": { }
                                    },
                                    {
                                        "stage": "IXSCAN",
                                        "keyPattern": {
                                            "_fts": "text",
                                            "_ftsx": NumberInt("1"),
                                            "publishTime": NumberInt("-1")
                                        },
                                        "indexName": "full_index",
                                        "isMultiKey": true,
                                        "isUnique": false,
                                        "isSparse": false,
                                        "isPartial": false,
                                        "indexVersion": NumberInt("2"),
                                        "direction": "backward",
                                        "indexBounds": { }
                                    }
                                ]
                            }
                        }
                    }
                }
            }
        },
        "rejectedPlans": [ ]
    },
    "executionStats": {
        "executionSuccess": true,
        "nReturned": NumberInt("10"),
        "executionTimeMillis": NumberInt("2954"),
        "totalKeysExamined": NumberInt("102154"),
        "totalDocsExamined": NumberInt("67415"),
        "executionStages": {
            "stage": "PROJECTION_DEFAULT",
            "nReturned": NumberInt("10"),
            "executionTimeMillisEstimate": NumberInt("2689"),
            "works": NumberInt("169627"),
            "advanced": NumberInt("10"),
            "needTime": NumberInt("169616"),
            "needYield": NumberInt("0"),
            "saveState": NumberInt("225"),
            "restoreState": NumberInt("225"),
            "isEOF": NumberInt("1"),
            "transformBy": {
                "publishTime": 1,
                "city": 1,
                "biddingType": 1,
                "downloadUrl": 1,
                "industry": 1,
                "updateTime": 1,
                "title": 1,
                "tags": 1,
                "sourceUrl": 1,
                "propertiesInfo": 1,
                "province": 1,
                "createTime": 1,
                "status": 1,
                "score": {
                    "$meta": "textScore"
                }
            },
            "inputStage": {
                "stage": "SKIP",
                "nReturned": NumberInt("10"),
                "executionTimeMillisEstimate": NumberInt("2679"),
                "works": NumberInt("169627"),
                "advanced": NumberInt("10"),
                "needTime": NumberInt("169616"),
                "needYield": NumberInt("0"),
                "saveState": NumberInt("225"),
                "restoreState": NumberInt("225"),
                "isEOF": NumberInt("1"),
                "skipAmount": NumberInt("0"),
                "inputStage": {
                    "stage": "SORT",
                    "nReturned": NumberInt("20"),
                    "executionTimeMillisEstimate": NumberInt("2678"),
                    "works": NumberInt("169627"),
                    "advanced": NumberInt("20"),
                    "needTime": NumberInt("169606"),
                    "needYield": NumberInt("0"),
                    "saveState": NumberInt("225"),
                    "restoreState": NumberInt("225"),
                    "isEOF": NumberInt("1"),
                    "sortPattern": {
                        "$computed0": {
                            "$meta": "textScore"
                        },
                        "publishTime": NumberInt("-1")
                    },
                    "memLimit": NumberInt("104857600"),
                    "limitAmount": NumberInt("20"),
                    "type": "default",
                    "totalDataSizeSorted": NumberLong("2342696268"),
                    "usedDisk": false,
                    "inputStage": {
                        "stage": "TEXT",
                        "nReturned": NumberInt("67415"),
                        "executionTimeMillisEstimate": NumberInt("2669"),
                        "works": NumberInt("169606"),
                        "advanced": NumberInt("67415"),
                        "needTime": NumberInt("102190"),
                        "needYield": NumberInt("0"),
                        "saveState": NumberInt("225"),
                        "restoreState": NumberInt("225"),
                        "isEOF": NumberInt("1"),
                        "indexPrefix": { },
                        "indexName": "full_index",
                        "parsedTextQuery": {
                            "terms": [
                                "公司",
                                "投标",
                                "招标",
                                "有限公司"
                            ],
                            "negatedTerms": [ ],
                            "phrases": [ ],
                            "negatedPhrases": [ ]
                        },
                        "textIndexVersion": NumberInt("3"),
                        "inputStage": {
                            "stage": "TEXT_MATCH",
                            "nReturned": NumberInt("67415"),
                            "executionTimeMillisEstimate": NumberInt("2668"),
                            "works": NumberInt("169606"),
                            "advanced": NumberInt("67415"),
                            "needTime": NumberInt("102190"),
                            "needYield": NumberInt("0"),
                            "saveState": NumberInt("225"),
                            "restoreState": NumberInt("225"),
                            "isEOF": NumberInt("1"),
                            "docsRejected": NumberInt("0"),
                            "inputStage": {
                                "stage": "TEXT_OR",
                                "filter": {
                                    "$and": [
                                        {
                                            "publishTime": {
                                                "$lte": ISODate("2024-01-01T23:59:59.000Z")
                                            }
                                        },
                                        {
                                            "publishTime": {
                                                "$gte": ISODate("2020-01-17T00:00:00.000Z")
                                            }
                                        }
                                    ]
                                },
                                "nReturned": NumberInt("67415"),
                                "executionTimeMillisEstimate": NumberInt("2661"),
                                "works": NumberInt("169606"),
                                "advanced": NumberInt("67415"),
                                "needTime": NumberInt("102190"),
                                "needYield": NumberInt("0"),
                                "saveState": NumberInt("225"),
                                "restoreState": NumberInt("225"),
                                "isEOF": NumberInt("1"),
                                "docsExamined": NumberInt("67415"),
                                "inputStages": [
                                    {
                                        "stage": "IXSCAN",
                                        "nReturned": NumberInt("31193"),
                                        "executionTimeMillisEstimate": NumberInt("158"),
                                        "works": NumberInt("31194"),
                                        "advanced": NumberInt("31193"),
                                        "needTime": NumberInt("0"),
                                        "needYield": NumberInt("0"),
                                        "saveState": NumberInt("225"),
                                        "restoreState": NumberInt("225"),
                                        "isEOF": NumberInt("1"),
                                        "keyPattern": {
                                            "_fts": "text",
                                            "_ftsx": NumberInt("1"),
                                            "publishTime": NumberInt("-1")
                                        },
                                        "indexName": "full_index",
                                        "isMultiKey": true,
                                        "isUnique": false,
                                        "isSparse": false,
                                        "isPartial": false,
                                        "indexVersion": NumberInt("2"),
                                        "direction": "backward",
                                        "indexBounds": { },
                                        "keysExamined": NumberInt("31193"),
                                        "seeks": NumberInt("1"),
                                        "dupsTested": NumberInt("31193"),
                                        "dupsDropped": NumberInt("0")
                                    },
                                    {
                                        "stage": "IXSCAN",
                                        "nReturned": NumberInt("518"),
                                        "executionTimeMillisEstimate": NumberInt("0"),
                                        "works": NumberInt("519"),
                                        "advanced": NumberInt("518"),
                                        "needTime": NumberInt("0"),
                                        "needYield": NumberInt("0"),
                                        "saveState": NumberInt("225"),
                                        "restoreState": NumberInt("225"),
                                        "isEOF": NumberInt("1"),
                                        "keyPattern": {
                                            "_fts": "text",
                                            "_ftsx": NumberInt("1"),
                                            "publishTime": NumberInt("-1")
                                        },
                                        "indexName": "full_index",
                                        "isMultiKey": true,
                                        "isUnique": false,
                                        "isSparse": false,
                                        "isPartial": false,
                                        "indexVersion": NumberInt("2"),
                                        "direction": "backward",
                                        "indexBounds": { },
                                        "keysExamined": NumberInt("518"),
                                        "seeks": NumberInt("1"),
                                        "dupsTested": NumberInt("518"),
                                        "dupsDropped": NumberInt("0")
                                    },
                                    {
                                        "stage": "IXSCAN",
                                        "nReturned": NumberInt("53254"),
                                        "executionTimeMillisEstimate": NumberInt("184"),
                                        "works": NumberInt("53255"),
                                        "advanced": NumberInt("53254"),
                                        "needTime": NumberInt("0"),
                                        "needYield": NumberInt("0"),
                                        "saveState": NumberInt("225"),
                                        "restoreState": NumberInt("225"),
                                        "isEOF": NumberInt("1"),
                                        "keyPattern": {
                                            "_fts": "text",
                                            "_ftsx": NumberInt("1"),
                                            "publishTime": NumberInt("-1")
                                        },
                                        "indexName": "full_index",
                                        "isMultiKey": true,
                                        "isUnique": false,
                                        "isSparse": false,
                                        "isPartial": false,
                                        "indexVersion": NumberInt("2"),
                                        "direction": "backward",
                                        "indexBounds": { },
                                        "keysExamined": NumberInt("53254"),
                                        "seeks": NumberInt("1"),
                                        "dupsTested": NumberInt("53254"),
                                        "dupsDropped": NumberInt("0")
                                    },
                                    {
                                        "stage": "IXSCAN",
                                        "nReturned": NumberInt("17189"),
                                        "executionTimeMillisEstimate": NumberInt("14"),
                                        "works": NumberInt("17190"),
                                        "advanced": NumberInt("17189"),
                                        "needTime": NumberInt("0"),
                                        "needYield": NumberInt("0"),
                                        "saveState": NumberInt("225"),
                                        "restoreState": NumberInt("225"),
                                        "isEOF": NumberInt("1"),
                                        "keyPattern": {
                                            "_fts": "text",
                                            "_ftsx": NumberInt("1"),
                                            "publishTime": NumberInt("-1")
                                        },
                                        "indexName": "full_index",
                                        "isMultiKey": true,
                                        "isUnique": false,
                                        "isSparse": false,
                                        "isPartial": false,
                                        "indexVersion": NumberInt("2"),
                                        "direction": "backward",
                                        "indexBounds": { },
                                        "keysExamined": NumberInt("17189"),
                                        "seeks": NumberInt("1"),
                                        "dupsTested": NumberInt("17189"),
                                        "dupsDropped": NumberInt("0")
                                    }
                                ]
                            }
                        }
                    }
                }
            }
        }
    },
    "serverInfo": {
        "host": "localhost.localdomain",
        "port": NumberInt("27017"),
        "version": "4.4.9",
        "gitVersion": "b4048e19814bfebac717cf5a880076aa69aba481"
    },
    "ok": 1
}

I noticed that he walked through the index clearly. Why is the totalDataSizeSorted almost 2.34GB during the sorting phase? Why is there such a large amount of data? In my test environment, there are only 170,000 records in total, and in the production environment, there are 5 million records. So, how large is this sorted data? I assume it goes through the index one by one, searching for words separated by spaces, then merges all the results, calculates the scores, and sorts them in the end. Is that correct? How can I address this issue?

Should I consider using an Search engine such as Elasticsearch for this purpose?

I hope to get everyone’s help. My English is not very good, and I used a translation software. Thank you all!

There are people who are much more expert than I am. But here is my thought: You are sorting entire documents that are rich in data. That is bound to use a large amount of memory and execution time in multiple memory allocations.

In a relational database (RDBMS with SQL) you have one table with the search keys and a primary key, company number. The company number is a foreign key into one or more tables with the rest of the company data. You JOIN the result set with the other company data and ORDER BY (sort by) the company number.

Perhaps in MongoDB you could try something like this where you detach the following

publishTime: 1,
    city: 1,
    biddingType: 1,
    downloadUrl: 1,
    industry: 1,
    updateTime: 1,
    title: 1,
    tags: 1,
    sourceUrl: 1,
    propertiesInfo: 1,
    province: 1,
    createTime

into a separate collection, search the collection which contains only your search terms and company key, sorting the result documents and use the company key in an aggregation pipeline $lookup stage.

Aggregation is the heart :heart: of MongoDB programming.

My English is not very good

Your English is truly very good! :slight_smile:

1 Like

Thank you for your response, I’ll go and give it a try.

Hi there, today I tried out the suggestions you gave me.

I had around 20 fields in the full data set named bidding_data. Subsequently, I extracted 3 fields and moved them to a new collection named new_collection.

These are the 3 fields.

{
  "_id".  // This is my ObjectId.
  "terms"  // This is the field where I manually segmented and stored the data with spaces.  He appears as follows:  "ChineseWord1 ChineseWord2 ChineseWord3"
  "publishTime" // This is my publication time. I wish to utilize it for data filtering.
}

I created a full-text index as follows, which is identical to the one previously created in the bidding_data.

db.collection.createIndex({ terms: "text", publicTime: -1 })

Next, I have my bidding_data collection and the new_collection collection, along with the storage space they take up.

The storage size of my bidding_data collection

{
  "Number of documents" : "176,631",
  "Total size in memory" : "4.63 GB (4,973,306,470)",
  "Average object size" : "27.50 KB (28,156)",
  "Storage size" : "986.11 MB (1,034,014,720)",
  "Total index size" : "1008.13 MB (1,057,099,776)"
}

The storage size of my new_collection collection

{
  "Number of documents" : "176,631",
  "Total size in memory" : "33.24 MB (34,854,347)",
  "Average object size" : "197 bytes (197)",
  "Storage size" : "16.39 MB (17,190,912)",
  "Total index size" : "54.67 MB (57,327,616)"
}

From the above, it can be seen that the size of this new collection has been reduced from the original collection’s 986MB to 16MB, a significant difference of 60 times. Following your advice, I used the $lookup search, and my execution statement is as follows.

db.new_collection.aggregate([
  {
    $match: {
      "$text": { "$language": "none", "$search": "公司 有限公司 投标 招标" },
      "publishtime": { "$gte": ISODate("2021-01-12T00:00:00Z"), "$lte": ISODate("2024-01-16T23:59:59Z") } 
    }
  },
  {
    $sort: {
      "score": { "$meta": "textScore" },
      "publishtime": -1
    }
  },
  {
    $skip: 0
  },
  {
    $limit: 10
  },
  {
    $project: {
      "_id": 1
    }
  },
  {
    $lookup: {
      from: "bidding_data",
      localField: "_id",
      foreignField: "_id",
      as: "bidding_data"
    }
  }
])

The result was achieved in only 0.4 seconds.

His plan of action is as follows.

// 1
{
    "stages": [
        {
            "$cursor": {
                "queryPlanner": {
                    "plannerVersion": NumberInt("1"),
                    "namespace": "bidding_data.new_collection",
                    "indexFilterSet": false,
                    "parsedQuery": {
                        "$and": [
                            {
                                "publishtime": {
                                    "$lte": ISODate("2024-01-16T23:59:59.000Z")
                                }
                            },
                            {
                                "publishtime": {
                                    "$gte": ISODate("2021-01-12T00:00:00.000Z")
                                }
                            },
                            {
                                "$text": {
                                    "$search": "公司 有限公司 投标 招标",
                                    "$language": "none",
                                    "$caseSensitive": false,
                                    "$diacriticSensitive": false
                                }
                            }
                        ]
                    },
                    "queryHash": "DEF90E69",
                    "planCacheKey": "E8972E79",
                    "winningPlan": {
                        "stage": "PROJECTION_SIMPLE",
                        "transformBy": {
                            "_id": true
                        },
                        "inputStage": {
                            "stage": "SORT",
                            "sortPattern": {
                                "$computed0": {
                                    "$meta": "textScore"
                                },
                                "publishtime": NumberInt("-1")
                            },
                            "memLimit": NumberInt("104857600"),
                            "limitAmount": NumberInt("10"),
                            "type": "default",
                            "inputStage": {
                                "stage": "TEXT",
                                "indexPrefix": { },
                                "indexName": "text_index",
                                "parsedTextQuery": {
                                    "terms": [
                                        "公司",
                                        "投标",
                                        "招标",
                                        "有限公司"
                                    ],
                                    "negatedTerms": [ ],
                                    "phrases": [ ],
                                    "negatedPhrases": [ ]
                                },
                                "textIndexVersion": NumberInt("3"),
                                "inputStage": {
                                    "stage": "TEXT_MATCH",
                                    "inputStage": {
                                        "stage": "TEXT_OR",
                                        "filter": {
                                            "$and": [
                                                {
                                                    "publishtime": {
                                                        "$lte": ISODate("2024-01-16T23:59:59.000Z")
                                                    }
                                                },
                                                {
                                                    "publishtime": {
                                                        "$gte": ISODate("2021-01-12T00:00:00.000Z")
                                                    }
                                                }
                                            ]
                                        },
                                        "inputStages": [
                                            {
                                                "stage": "IXSCAN",
                                                "keyPattern": {
                                                    "_fts": "text",
                                                    "_ftsx": NumberInt("1"),
                                                    "publishtime": NumberInt("-1")
                                                },
                                                "indexName": "text_index",
                                                "isMultiKey": true,
                                                "isUnique": false,
                                                "isSparse": false,
                                                "isPartial": false,
                                                "indexVersion": NumberInt("2"),
                                                "direction": "backward",
                                                "indexBounds": { }
                                            },
                                            {
                                                "stage": "IXSCAN",
                                                "keyPattern": {
                                                    "_fts": "text",
                                                    "_ftsx": NumberInt("1"),
                                                    "publishtime": NumberInt("-1")
                                                },
                                                "indexName": "text_index",
                                                "isMultiKey": true,
                                                "isUnique": false,
                                                "isSparse": false,
                                                "isPartial": false,
                                                "indexVersion": NumberInt("2"),
                                                "direction": "backward",
                                                "indexBounds": { }
                                            },
                                            {
                                                "stage": "IXSCAN",
                                                "keyPattern": {
                                                    "_fts": "text",
                                                    "_ftsx": NumberInt("1"),
                                                    "publishtime": NumberInt("-1")
                                                },
                                                "indexName": "text_index",
                                                "isMultiKey": true,
                                                "isUnique": false,
                                                "isSparse": false,
                                                "isPartial": false,
                                                "indexVersion": NumberInt("2"),
                                                "direction": "backward",
                                                "indexBounds": { }
                                            },
                                            {
                                                "stage": "IXSCAN",
                                                "keyPattern": {
                                                    "_fts": "text",
                                                    "_ftsx": NumberInt("1"),
                                                    "publishtime": NumberInt("-1")
                                                },
                                                "indexName": "text_index",
                                                "isMultiKey": true,
                                                "isUnique": false,
                                                "isSparse": false,
                                                "isPartial": false,
                                                "indexVersion": NumberInt("2"),
                                                "direction": "backward",
                                                "indexBounds": { }
                                            }
                                        ]
                                    }
                                }
                            }
                        }
                    },
                    "rejectedPlans": [ ]
                },
                "executionStats": {
                    "executionSuccess": true,
                    "nReturned": NumberInt("10"),
                    "executionTimeMillis": NumberInt("458"),
                    "totalKeysExamined": NumberInt("102154"),
                    "totalDocsExamined": NumberInt("62355"),
                    "executionStages": {
                        "stage": "PROJECTION_SIMPLE",
                        "nReturned": NumberInt("10"),
                        "executionTimeMillisEstimate": NumberInt("107"),
                        "works": NumberInt("169617"),
                        "advanced": NumberInt("10"),
                        "needTime": NumberInt("169606"),
                        "needYield": NumberInt("0"),
                        "saveState": NumberInt("170"),
                        "restoreState": NumberInt("170"),
                        "isEOF": NumberInt("1"),
                        "transformBy": {
                            "_id": true
                        },
                        "inputStage": {
                            "stage": "SORT",
                            "nReturned": NumberInt("10"),
                            "executionTimeMillisEstimate": NumberInt("107"),
                            "works": NumberInt("169617"),
                            "advanced": NumberInt("10"),
                            "needTime": NumberInt("169606"),
                            "needYield": NumberInt("0"),
                            "saveState": NumberInt("170"),
                            "restoreState": NumberInt("170"),
                            "isEOF": NumberInt("1"),
                            "sortPattern": {
                                "$computed0": {
                                    "$meta": "textScore"
                                },
                                "publishtime": NumberInt("-1")
                            },
                            "memLimit": NumberInt("104857600"),
                            "limitAmount": NumberInt("10"),
                            "type": "default",
                            "totalDataSizeSorted": NumberInt("19739136"),
                            "usedDisk": false,
                            "inputStage": {
                                "stage": "TEXT",
                                "nReturned": NumberInt("62355"),
                                "executionTimeMillisEstimate": NumberInt("95"),
                                "works": NumberInt("169606"),
                                "advanced": NumberInt("62355"),
                                "needTime": NumberInt("107250"),
                                "needYield": NumberInt("0"),
                                "saveState": NumberInt("170"),
                                "restoreState": NumberInt("170"),
                                "isEOF": NumberInt("1"),
                                "indexPrefix": { },
                                "indexName": "text_index",
                                "parsedTextQuery": {
                                    "terms": [
                                        "公司",
                                        "投标",
                                        "招标",
                                        "有限公司"
                                    ],
                                    "negatedTerms": [ ],
                                    "phrases": [ ],
                                    "negatedPhrases": [ ]
                                },
                                "textIndexVersion": NumberInt("3"),
                                "inputStage": {
                                    "stage": "TEXT_MATCH",
                                    "nReturned": NumberInt("62355"),
                                    "executionTimeMillisEstimate": NumberInt("94"),
                                    "works": NumberInt("169606"),
                                    "advanced": NumberInt("62355"),
                                    "needTime": NumberInt("107250"),
                                    "needYield": NumberInt("0"),
                                    "saveState": NumberInt("170"),
                                    "restoreState": NumberInt("170"),
                                    "isEOF": NumberInt("1"),
                                    "docsRejected": NumberInt("0"),
                                    "inputStage": {
                                        "stage": "TEXT_OR",
                                        "filter": {
                                            "$and": [
                                                {
                                                    "publishtime": {
                                                        "$lte": ISODate("2024-01-16T23:59:59.000Z")
                                                    }
                                                },
                                                {
                                                    "publishtime": {
                                                        "$gte": ISODate("2021-01-12T00:00:00.000Z")
                                                    }
                                                }
                                            ]
                                        },
                                        "nReturned": NumberInt("62355"),
                                        "executionTimeMillisEstimate": NumberInt("94"),
                                        "works": NumberInt("169606"),
                                        "advanced": NumberInt("62355"),
                                        "needTime": NumberInt("107250"),
                                        "needYield": NumberInt("0"),
                                        "saveState": NumberInt("170"),
                                        "restoreState": NumberInt("170"),
                                        "isEOF": NumberInt("1"),
                                        "docsExamined": NumberInt("62355"),
                                        "inputStages": [
                                            {
                                                "stage": "IXSCAN",
                                                "nReturned": NumberInt("31193"),
                                                "executionTimeMillisEstimate": NumberInt("2"),
                                                "works": NumberInt("31194"),
                                                "advanced": NumberInt("31193"),
                                                "needTime": NumberInt("0"),
                                                "needYield": NumberInt("0"),
                                                "saveState": NumberInt("170"),
                                                "restoreState": NumberInt("170"),
                                                "isEOF": NumberInt("1"),
                                                "keyPattern": {
                                                    "_fts": "text",
                                                    "_ftsx": NumberInt("1"),
                                                    "publishtime": NumberInt("-1")
                                                },
                                                "indexName": "text_index",
                                                "isMultiKey": true,
                                                "isUnique": false,
                                                "isSparse": false,
                                                "isPartial": false,
                                                "indexVersion": NumberInt("2"),
                                                "direction": "backward",
                                                "indexBounds": { },
                                                "keysExamined": NumberInt("31193"),
                                                "seeks": NumberInt("1"),
                                                "dupsTested": NumberInt("31193"),
                                                "dupsDropped": NumberInt("0")
                                            },
                                            {
                                                "stage": "IXSCAN",
                                                "nReturned": NumberInt("518"),
                                                "executionTimeMillisEstimate": NumberInt("0"),
                                                "works": NumberInt("519"),
                                                "advanced": NumberInt("518"),
                                                "needTime": NumberInt("0"),
                                                "needYield": NumberInt("0"),
                                                "saveState": NumberInt("170"),
                                                "restoreState": NumberInt("170"),
                                                "isEOF": NumberInt("1"),
                                                "keyPattern": {
                                                    "_fts": "text",
                                                    "_ftsx": NumberInt("1"),
                                                    "publishtime": NumberInt("-1")
                                                },
                                                "indexName": "text_index",
                                                "isMultiKey": true,
                                                "isUnique": false,
                                                "isSparse": false,
                                                "isPartial": false,
                                                "indexVersion": NumberInt("2"),
                                                "direction": "backward",
                                                "indexBounds": { },
                                                "keysExamined": NumberInt("518"),
                                                "seeks": NumberInt("1"),
                                                "dupsTested": NumberInt("518"),
                                                "dupsDropped": NumberInt("0")
                                            },
                                            {
                                                "stage": "IXSCAN",
                                                "nReturned": NumberInt("53254"),
                                                "executionTimeMillisEstimate": NumberInt("21"),
                                                "works": NumberInt("53255"),
                                                "advanced": NumberInt("53254"),
                                                "needTime": NumberInt("0"),
                                                "needYield": NumberInt("0"),
                                                "saveState": NumberInt("170"),
                                                "restoreState": NumberInt("170"),
                                                "isEOF": NumberInt("1"),
                                                "keyPattern": {
                                                    "_fts": "text",
                                                    "_ftsx": NumberInt("1"),
                                                    "publishtime": NumberInt("-1")
                                                },
                                                "indexName": "text_index",
                                                "isMultiKey": true,
                                                "isUnique": false,
                                                "isSparse": false,
                                                "isPartial": false,
                                                "indexVersion": NumberInt("2"),
                                                "direction": "backward",
                                                "indexBounds": { },
                                                "keysExamined": NumberInt("53254"),
                                                "seeks": NumberInt("1"),
                                                "dupsTested": NumberInt("53254"),
                                                "dupsDropped": NumberInt("0")
                                            },
                                            {
                                                "stage": "IXSCAN",
                                                "nReturned": NumberInt("17189"),
                                                "executionTimeMillisEstimate": NumberInt("3"),
                                                "works": NumberInt("17190"),
                                                "advanced": NumberInt("17189"),
                                                "needTime": NumberInt("0"),
                                                "needYield": NumberInt("0"),
                                                "saveState": NumberInt("170"),
                                                "restoreState": NumberInt("170"),
                                                "isEOF": NumberInt("1"),
                                                "keyPattern": {
                                                    "_fts": "text",
                                                    "_ftsx": NumberInt("1"),
                                                    "publishtime": NumberInt("-1")
                                                },
                                                "indexName": "text_index",
                                                "isMultiKey": true,
                                                "isUnique": false,
                                                "isSparse": false,
                                                "isPartial": false,
                                                "indexVersion": NumberInt("2"),
                                                "direction": "backward",
                                                "indexBounds": { },
                                                "keysExamined": NumberInt("17189"),
                                                "seeks": NumberInt("1"),
                                                "dupsTested": NumberInt("17189"),
                                                "dupsDropped": NumberInt("0")
                                            }
                                        ]
                                    }
                                }
                            }
                        }
                    }
                }
            },
            "nReturned": NumberLong("10"),
            "executionTimeMillisEstimate": NumberLong("456")
        },
        {
            "$lookup": {
                "from": "bidding_data",
                "as": "bidding_data",
                "localField": "_id",
                "foreignField": "_id"
            },
            "nReturned": NumberLong("10"),
            "executionTimeMillisEstimate": NumberLong("458")
        }
    ],
    "serverInfo": {
        "host": "localhost.localdomain",
        "port": NumberInt("27017"),
        "version": "4.4.9",
        "gitVersion": "b4048e19814bfebac717cf5a880076aa69aba481"
    },
    "ok": 1
}

As a result of the reduced document size, the totalDataSizeSorted in the sorting stage has decreased from 2.18 GB (2342696268) to 18.8 MB (19739136). Consequently, MongoDB only required 0.4 seconds to generate the result.

But I do have a few questions.

In my current test environment, I only used 0.4 seconds for 170,000 pieces of data. However, in my production environment, there are 5 million pieces of data, which is 29 times the difference in data volume. Given this 5 million data volume, can I anticipate a 29-fold increase in response time? Is this approach viable?

Have I misconstrued your suggestion? Are you proposing that I manually construct an inverted index using this new collection, or simply transfer the necessary search fields to the new collection to decrease the document size?

I also want to use Atlas, but I realized that all Atlas servers are located in Europe. Since our clients are in China, having the servers in Europe would result in very slow network latency for user searches.

If MongoDB doesn’t pan out, our last resort is to consider utilizing Elasticsearch.

As a result of the reduced document size, the totalDataSizeSorted in the sorting stage has decreased from 2.18 GB (2342696268) to 18.8 MB (19739136). Consequently, MongoDB only required 0.4 seconds to generate the result.

Excellent! Good work.

In my current test environment, I only used 0.4 seconds for 170,000 pieces of data. However, in my production environment, there are 5 million pieces of data, which is 29 times the difference in data volume. Given this 5 million data volume, can I anticipate a 29-fold increase in response time? Is this approach viable?

I do not know the answer, but I do think you will not see a 29-fold increase in response time. Probably a good portion of the 0.4 seconds is spent in calculating the action plan. MongoDB experts like @Harshit may be able to give you more detailed information.

Have I misconstrued your suggestion? Are you proposing that I manually construct an inverted index using this new collection, or simply transfer the necessary search fields to the new collection to decrease the document size?

I think you did exactly what I suggested.

I also want to use Atlas, but I realized that all Atlas servers are located in Europe. Since our clients are in China, having the servers in Europe would result in very slow network latency for user searches.

A MongoDB employee will have to address this. I am just a MongoDB user, like you.

1 Like

Thank you so much. :innocent:

1 Like

@Harshit Can you offer me some advice?

I am not the best to help here @greenwich123a and @Jack_Woehr but let me get some expert’s eyes on it to help you out :slight_smile:

4 Likes

Hi @greenwich123a :wave:

I guess if we are solely comparing the counts of documents then it is difficult to say (for one instance, are the other ~4.8million documents of the same size). In my opinion, estimations of query / execution times using linear scaling based off document count would not give you the most accurate response time and the approach is not something I recommended due to several reasons, some being:

  1. Hardware differences between the two instances
  2. Document size differences between the two data sets
  3. Index sizes being different, and in turn, index + data size optimally fitting into memory.

Unfortunately, I believe in terms of testing for response times / query times - it would be better to use the expected 5 million documents you have mentioned on a similarly configured test system if possible. This in turn would give you a much better representation of the response / execution times. However, if thats not possible you can even try with 5-10x and see how that performs.

There is some additional support for there is a support for generic Chinese (cjk) in Atlas Search if this is of use to you.

Hope the above helps.

Regards,
Jason

1 Like

Thank you for your response.
After considering it, Jack_Woehr’s suggestion is to place the columns used for searching into a new collection. However, there’s an issue: the columns we use for searching may change. If a new column is added for searching later on, the data in this new collection would need updating. This makes it less scalable. Even if this solution can offer a 1-second response time in the production environment, due to the scalability issue, we may still not prefer this approach. Regarding the Atlas Search solution, I observed that the servers are all in Europe, while our company and clients are in mainland China. The network latency is not very acceptable. Therefore, my idea is to use a lightweight search engine like Apache Lucene to implement this. What do you all think? Are there any other better ideas? @Jason_Tran @Jack_Woehr @Harshit