$max operator with Atlas $searchMeta MongoDB

ahmad_al_sharbaji · July 24, 2023, 6:20am

My documents are like this below:

{
    "_id": ObjectId("63bd85a644552d73f36c366e"),
    "ts": datetime.datetime(2023, 1, 10, 15, 35, 2, 374000),
    "dt": datetime.datetime(2023, 1, 10, 15, 34, 10),
    "aff": 2,
    "src": 2,
    "st": 2
}

I started using $searchMeta faceting for a better and way faster approach.
So, I converted my aggregation pipeline from this approach:

results = my_collection.aggregate(
[
            {
                "$match": {
                    "ts": {"$gte": datetime.now() - timedelta(hours=24)},
                    "st": 2
                }
            },
            {
                "$group": {
                    "_id": {"aff": "$aff", "src": "$src"},
                    "count": {"$sum": 1},
                    "last_message_datetime": {"$max": "$dt"}
                }
            },
            {"$sort": {"_id.src": -1, "_id.aff": -1}}
        ]
)

into this below:

results = my_collection.aggregate([
            {
                "$searchMeta": {
                    "index": "MsgAtlasIndex",
                    "count": {"type": "total"},
                    "facet": {
                        "operator": {
                            "compound": {
                                "must": [
                                    {
                                        "range": {
                                            "path": "ts",
                                            "gte": datetime.now() - timedelta(hours=24)
                                        }
                                    },
                                    {"equals": {"path": "st", "value": 2}},
                                ]
                            }
                        },
                        "facets": {
                            "src": {
                                "type": "number",
                                "path": "src",
                                "boundaries": [1, 2, 3, 4, 5, 6, 7, 10, 11]
                            },
                            "aff": {
                                "type": "number",
                                "path": "aff",
                                "boundaries": [1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 14, 17, 18, 19, 20, 21]
                            },
                        },
                    },
                }
            },
            {
                "$match": {
                    "$or": [
                        {"facet.aff.buckets.count": {"$gt": 0}},
                        {"facet.src.buckets.count": {"$gt": 0}}
                    ]
                }
            },
            {
                "$project": {
                    "_id": 0,
                    "count": 1,
                    "facet.aff.buckets": {
                        "$filter": {
                            "input": "$facet.aff.buckets",
                            "as": "bucket",
                            "cond": {"$gt": ["$$bucket.count", 0]}
                        }
                    },
                    "facet.src.buckets": {
                        "$filter": {
                            "input": "$facet.src.buckets",
                            "as": "bucket",
                            "cond": {"$gt": ["$$bucket.count", 0]}
                        }
                    }
                }
            }
        ])

I suffered with how to apply this line below in searchMeta:

"last_message_datetime": {"$max": "$dt"}

It is to get the maximum date of the counted messages:

(the date of the last message of every group) using the field dt (date).

Any help, please?

Aasawari · August 1, 2023, 4:12am

Hi @ahmad_al_sharbaji and welcome to MongoDB community forums!!

Based on the sample data and the aggregation pipeline shared, could you provide details for the below mentioned points which would help me understand the use case and replicate in my test environment to see if what you are after is possible.

Based on the two queries mentioned, could you help me understand the reason why are you considering to use the searchMeta in the aggregation stage. As mentioned in the MongoDB documentation for searchMeta, this stage offers you to store the data in the form of buckets and returns different types of metadata result documents.

Secondly, I tried to replicate the query in local environment, with sample data similar to provided, and it provides the data similar to this:

[{‘count’: {‘total’: 1001}, ‘facet’: {‘aff’: {‘buckets’: [{‘_id’: 1, ‘count’: 124}, {‘_id’: 2, ‘count’: 137}, {‘_id’: 3, ‘count’: 119}, {‘_id’: 4, ‘count’: 133}, {‘_id’: 5, ‘count’: 260}, {‘_id’: 7, ‘count’: 110}, {‘_id’: 8, ‘count’: 118}]}, ‘src’: {‘buckets’: [{‘_id’: 1, ‘count’: 129}, {‘_id’: 2, ‘count’: 109}, {‘_id’: 3, ‘count’: 136}, {‘_id’: 4, ‘count’: 126}, {‘_id’: 5, ‘count’: 109}, {‘_id’: 6, ‘count’: 136}, {‘_id’: 7, ‘count’: 256}]}}}]

Lastly, are you able to provide the expect / desired output based off your sample document(s)?

Regards
Aasawari

ahmad_al_sharbaji · August 1, 2023, 2:14pm

Hi @Aasawari , Thank you for your kind response!

My goal is to count documents for every aff and src together. So, I will be counting how many documents for aff 1 and src 1, then how many documents for aff 2 and src 1, etc…

Why I used searchMeta? Because I found a hilarious speed! Instead of taking around 15 seconds per query, it takes ~ 0.5 using searchMeta. Our database contains more than 55 million documents.

But, it is not a complete solution. As you see in the regular aggregation, I used to $group. But now using searchMeta I don’t know to do that!
It counts separately, but I need them combined! I need to count documents for aff and src together in the same bucket or something else, not separately.
As you see in your results, you have 2 buckets, one counting for aff and one for src. The goal is to count together.

Please help me to achieve that.

Aasawari · August 7, 2023, 5:29am

Hi @ahmad_al_sharbaji

Thank you for writing back.

If I understand correctly, the goal of using $searchMeta is because it tremendously increases the query execution time.
Using $searchMeta it would create the buckets and would count the according to the buckets created.
I believe perhaps there would be another way to increase the efficiency.
We may be able help you in this case if you could share how you would want the desired output to look like.

Thanks
Aasawari

ahmad_al_sharbaji · August 7, 2023, 6:27am

Hi @Aasawari ,
Thank you for time. Exactly. I wanna make a better speed and increases the query execution time.
And exactly the buckets are my issue. If one bucket would be grouping among src and aff together, that would be perfect. But they are separated and that is my issue.

The desired output is like this:

[{'_id': {'aff': 21, 'src': 8},
  'count': 24,
  'last_message_datetime': datetime.datetime(2023, 8, 2, 10, 13)},
 {'_id': {'aff': 2, 'src': 6},
  'count': 222,
  'last_message_datetime': datetime.datetime(2023, 8, 2, 11, 30, 9)},
 {'_id': {'aff': 2, 'src': 2},
  'count': 34,
  'last_message_datetime': datetime.datetime(2023, 8, 2, 16, 17, 2)}]

This will count documents for the last 24 hours ts field, where status is 2 st field, and will group over documents via source src and affiliation aff fields, and will get the latest message time via dt field.

You can kindly create a simple collection, insert some documents like the sample I provided, change the aff and src for some documents, and apply the first aggregate I provided, you will get 100% the desired output.

This is really urgent and I’m waiting for your response.
Thanks in advance!

Aasawari · August 16, 2023, 4:53am

Hi @ahmad_al_sharbaji,

Thanks for getting back to me with those details. After taking a read over the post again I understand that the first aggregation mentioned nearly gets you your desired output (as opposed to the desired output to the dot) but please correct me if my interpretation here is wrong.

Please advise me on the below information:

Could you provide the output of when you ran this query? This is to see what you are currently getting and if it’s possible to achieve you’re desired output based off the current output.
Additionally, I understand that you’ve provided a single sample document but would you be able to provide several sample documents and the expected output based off these exact sample documents so that I can easily insert these into my test environment to see if the desired output is possible?
Lastly, please provide the search index definition in use in JSON format, as I can use this to replicate the aggregation on the sample documents in my test environment.

Warm regards
Aasawari

ahmad_al_sharbaji · August 16, 2023, 11:47am

Hi @Aasawari ,
I actually appreciate your effort. I worked hard and decided not to use facets. I kept using searchMeta and it’s incredible speed. But guess what.

In my normal aggregation, I used to get the time of the last collected document. But I couldn’t use it using Atlas searchMeta.

Please visit this topic I created another one:

Thanks!!