Got a app situation where I’m executing a bulk-write operation on a sub-collection insert. Between the line of code that writes the data, there’s about another several dozen lines of code that executes, including writing one or more log messages… then I do a fetch of the data but I’m consistently fetching the pre-write version of the record instead of the just-updated version.
There’s a slim chance the data will be on-disk and fetched proper, but it’s more likely than not to be absent, according to test results. (About 1 in 10.)
fsync() isn’t an option b/c of the overhead. This is an enterprise app so I’m not going to introduce delays, like sleep(), to make-up for a lack of write-behind caching, or collection-level write-locks because the app is asynchronous.
I was curious as to what others have done to compensate. As of now, my only option is to return the results of the operation (success, -n- records updated) forcing the user to make a subsequent call to fetch the updated record which I think is an inefficient kludge.
There is many things that can be happening so nothing that follows might be the right thing.
You say the app is asynchronous, then may be you start reading before the asynchronous writing is all done.
May be you have read preference for secondaries that are not completely sync yet.
I really do not know why fsync() is part of this discussion. Are you read data from disk or something like that? Hope not.
The read is contiguous with respect to the write. It’s highly unlikely that a read-request for the same record would be posted asynchronously although it could happen. I mentioned the async nature b/c the app is - which is why I’ll avoid collection-level locking - but the current read is happening after the write within the same “thread”, to misuse the term.
My code does favor read-slaves so, yes, it’s definitely the case where the write hasn’t propagated to replication nodes in the cluster. This was implicit in the problem statement; I could have made that clearer.
Reading from storage is… do you read from somewhere else for cold data? I could have written my own write-behind cache and then would normally query it first. But I didn’t. Forcing a disk flush prior to the write is the sysadmin response - not a programming solution.
I think what I’ll do to mitigate is pre-fetch the record, calc a checksum, then do the update, then loop until I fetch a copy of the updated record s.t. the checksum are not equivalent so long as I’m within a few attempts at reading the updated record successfully. It’s either that or develop the write-behind caching to compensate.
I do wish mongo would take a page from mysql and provide the updated record as part of a success response tho – it would make my life a lot easier for sure…
If you want to ready what is just written, the Primary is your best bet. The data is going to be hot in cache.
If that is not your flavor then what is your read and write concern? Sounds like these both need to be majority for what you want to do.
Depending on your topology w:2 r:2 might be enough.
@chris: My Driver\Manager settings use secondaryPreferred for read-primary and primaryPreferred for secondary preference.
I bumped-up my write-concern setting to w=2 and I’ve had 100% hits on the fetch so I’m going to mark that as a solution. I’ve not tested this yet in a production deployment - but with the w=2 setting, I don’t think it will matter as the db topo isn’t going to be radically different.
Thanks, all, for stirring up my brain chemistry - much appreciated!
Good to hear.
Just a note or two.
Given a three node replica set majority is essentially the same thing as w:2 or r:2
If the topology is PSA then the failure of either data node will cause w: r:2 to fail. Majority would work.
Any larger replica sets you can be back in the same scenario you were previously experiencing.
But, if you are looking for the answer to the topic ‘The D in ACID’. Majority write concern is the correct answer.
I wrote the connecting/resource-mgmt code over two years ago - it was good to go back and review. The logic was basically:
$w = ($production) ? MAJ : 1;
I think I’ll eliminate the conditional and just leave it set to majority.
Thanks, again, Chris, for the help!