I end up having anywhere from 1.5Million to 3Million documents in the collection. Ingest is from public Government CSV data aggregated to gov_data.businesses collection. Everything is ALLCAPS. I aggregated the data to a new collection with the address.city and name fields $toLower. Now I need to titleCase those fields. using address.city instead of name in the following code takes a while (28 minutes), but succeeds. name however fails with TypeError: Cannot read properties of undefined (reading 'toUpperCase') after some 400,000 documents at about 8 (minutes). Feels like a data size issue, but I’ve no idea. I’m relatively new to aggregations and coding in mongo/mongosh.
Running tests on a local collection with 600, 000 records the following were timings:
Single Updates: 452s
Batch of 1,000: 104s
Batch of 10,000: 96s
The error you’re getting looks like it’s due to a whitespace in the input, i.e.
print(titleCase(' '))
Gives:
TypeError: Cannot read properties of undefined (reading 'toUpperCase')
So one of your documents has a field with a space in it that’s causing the issue, you need a base case in the function to catch this before trying to access the character at the first location [0]:
…you can look at the return object and collate the updates to see how many matches and upates were performed as a sanity check, the return object looks like this:
Thanks. You are da man… that worked perfectly. I just didn’t know the allowable code. I’m still thinking database…
I had issues with the console in DataGrip (Arity Error), but this worked in mongosh.
Thanks again.
I refactored it complete with an initial aggregation to create the collection from the source, then run 2 separate batches; one for the name field, and one for the city field (this is really just data beautification).
The entire scripts runs in less than 19 minutes, where seeding it all through the API took upwards of 70 hours. I’d say that’s an improvement…
Thanks for your help again.