Split large Stitch query into manageable blocks

Neil_Albiston · May 4, 2020, 11:08am

I have a collection containing a large number (500,000+ ) of records which I want to group by an id. The id is an encrypted 32 character string. So difficult to split into manageable chunks using a match statement.

The group command works from a mongo Client or Atlas by defining AllowDiskUse = true. But I would like to run the query automatically in the background when a table is updated.
AllowDiskUse is not available in stitch. How an I split up the query to allow a group by a field containing random text Ids? The group query I would like to use is…

db.matchedResponsesStage1.aggregate(
   [
       {
           "$sort" : {            
                   "_id": 1
           }
       },
       { 
           "$group" : { 
               "_id" : { 
                   "folder_id" : "$folder_id"                
               }, 
               "journeystart" : { 
                   "$first" : { 
                       "$arrayElemAt" : [
                           "$content.labels.name", 
                           0.0
                       ]
                   }
               }
           }
       }
   ], 
   { 
       "allowDiskUse" : true
   }
);

Note the initial sort on _id is to put grouped records into time order so that I can pick out the first record from the list. Unfortunately this prevents an index on folder_id from being used, which would reduce the memory footprint and allow the query to pass.
I would like to split the list of folder_id into manageable blocks ( e.g. 1000) and feed each block into the above query. Does anyone have any guidance on how to do this within stitch? Any help or guidance appreciated.