I have a collection containing a large number (500,000+ ) of records which I want to group by an id. The id is an encrypted 32 character string. So difficult to split into manageable chunks using a match statement.
The group command works from a mongo Client or Atlas by defining AllowDiskUse = true. But I would like to run the query automatically in the background when a table is updated.
AllowDiskUse is not available in stitch. How an I split up the query to allow a group by a field containing random text Ids? The group query I would like to use is…
db.matchedResponsesStage1.aggregate(
[
{
"$sort" : {
"_id": 1
}
},
{
"$group" : {
"_id" : {
"folder_id" : "$folder_id"
},
"journeystart" : {
"$first" : {
"$arrayElemAt" : [
"$content.labels.name",
0.0
]
}
}
}
}
],
{
"allowDiskUse" : true
}
);
Note the initial sort on _id is to put grouped records into time order so that I can pick out the first record from the list. Unfortunately this prevents an index on folder_id from being used, which would reduce the memory footprint and allow the query to pass.
I would like to split the list of folder_id into manageable blocks ( e.g. 1000) and feed each block into the above query. Does anyone have any guidance on how to do this within stitch? Any help or guidance appreciated.