I’m facing the following task:
- I have 300 millions+ document in collection, and I’d like to update a field for ALL document in that collection. So the task is to retrieve the document, get the field, then do some string processing with it (parse, shorten, and do some hashing - not hard but not exactly trivial), and put that document back to the collection.
- My domain language is Java.
My question is: what is the best way to accomplish this task ? Obviously, we can’t get 300M+ document at once, is there a streaming capability in mongo-java that allows my application process the documents returned from the server as a stream.
Another question is what is the most effective way to update the document after it’s retrieved and amended ? Even better, is there a way that the above task is done at the server side (so we don’t have to get, amend, and put back) ? This is similar to a simple update query (where we can “set” the value we want, and send it at once, the update is done at the server).
Comments, Suggestions, Questions are welcome.