I have built a service which will fetch data from Crossref and stored in hosted database on server. I’m currently really struggling with updating documents at scale and would appreciate some advice.
I’m using upsert: true option and mongoose BulkWrite operation while updating document. So, if new documents arrives and it does not exist in collection it will create document for the same.
Average size of documents is 8kb and field that I’m using for filtering documents is set unique: true and have created index on the same field.
Here, at a moment I’m fetching 500 documents from crossref and trying to upserting it vut it takes more than 3 minutes to upsert those documents which is too longer for my usecase.
What should be reason of these much delay and what is optimal number of operation that I should perform in single bulkWrite call ?
A maximum of 10000 writes are allowed in a single batch operation.
In saying so, for your use case, where you are trying to bulk insert 500 documents, ideally should not cause issues.
However, to help me understand further, could you share a few details for the deployment:
The deployment architecture(stand alone, replica set, sharded cluster)
The command that you are trying to execute.
How large is the a single document size.
Are you able to efficiently do it without using crossref using shell command.
Can you share the explain plan for the operation?
In addition, since we do not have the expertise in crossref, I would suggest to reach out the CrossRef community page for additional information if the issue persists at their end.