Hi,
we have a collection with more than 300 millions documents which needs to be updated more times per day. The update itself is made of a couple of millions of document. A single document is like this (it represents a social media content metric at a certain timestamp):
{
_id:ObjectId(61a4d4832850c97dec709291),
contentId:"657083495702124",
granularity:"hour",
metricTimestamp:2021-11-28T16:00:00.000+00:00,
sampling:"absolute",
contentCreatedTimestamp:2021-11-22T09:32:28.000+00:00,
name:"post_video_views_rate_non_paid_unique",
key:"value",
value:0.09930861093651791
}
The update runs in a aws lambda, so we would like it to be under 15 minutes and consume less than 10Gb ram.
At the moment the update is implemented as node script. Here is an extract (edited for brevity):
........................
const deltaQueryResults = await executeQuery(deltaQuery);
const bulkOps =
deltaQueryResults
.map(metric => ({
updateOne: {
filter: {
contentId: metric.contentId,
granularity: metric.granularity,
metricTimestamp: metric.metricTimestamp,
sampling: metric.sampling,
name: document.name,
key: document.key
},
update: {
$set: {
contentId: metric.contentId,
granularity: metric.granularity,
metricTimestamp: metric.metricTimestamp,
sampling: metric.sampling,
name: document.name,
key:document.key
}, },
upsert: true
}
}))
const bulkWriteOpResult = await metricModel.collection.bulkWrite(bulkOps, {w: 0, j: false}, {ordered: false})
....................
MongoDB instance is a M40 low CPU tier on Atlas (one cluster with 3 nodes)
Any suggestion on how to make it faster is very welcome!
Thank you
Luigi