Using Change Streams in a standalone database

CJ_Leaird · October 6, 2021, 3:04pm

Hi, all! I’m new to MongoDB and am hoping to see if anyone on here has run into some of the issues I’m having.

Just to provide some context, I am developing a desktop application that monitors changes to a MongoDB hosted locally. This MongoDB is initialized and maintained as a standalone database from another desktop application.

I came across the watch() APIs that create Change Streams that may be used to access changes to the database. After running the application, I received this error:

com.mongodb.MongoCommandException: Command failed with error 40573 (Location40573): 'The $changeStream stage is only supported on replica sets' on server localhost:27017. The full response is {"ok": 0.0, "errmsg": "The $changeStream stage is only supported on replica sets", "code": 40573, "codeName": "Location40573"}

So naturally, I dug into this a bit. It looks like Change Streams only work if the database is initialized as a replica set. I was able to shutdown the database (which, again, was initialized by a separate desktop application) and re-initialize it using the --replSet rs0 option. After running my application again, everything worked! It seems I’ve created a single-node replication set just so that the Change Streams can have access to the oplog?

I guess my question is based on how the Change Streams work. Am I correct to assume that the Change Streams monitor the oplog directly, and the only way that oplog is created is by initializing the database as a replication set?

If this assumption is correct, does anyone know of any other way to create the oplog without having to initialize the database as a replication set?

Or can I monitor updates to the oplog directly? I’m assuming that might be a huge pain, which is probably why the Change Streams exist in the first place.

Thanks all for your help! Let me know if you need more context!

MaBeuLux88_xxx · October 6, 2021, 6:50pm

Hi @CJ_Leaird and welcome in the MongoDB Community !

Everything that you said is correct here.

Yes the Change Stream are implemented on top of the local.oplog.rs collection which is the special system/internal collection that MongoDB uses to replicate write operations from one node to another.

There is no other way to create the oplog entries. Using a single node Replica Set (RS) is the right way to work locally with MongoDB when you need to use features that are only available in RS like Change Streams and multi-doc ACID transactions for examples.

Before Change Stream where developed, people where monitoring directly the oplog entries but this is painful for multiple reasons and actually requires a LARGE amount of code to be done correctly… To begin with, some operations in the oplog might be rollbacked later on if the node goes down shortly after a write operation and couldn’t replicate it before another nodes takes over as Primary. (=write operation with writeConcern w=1). Such operation would show up in your “oplog tailing” but not in the Change Stream as this operation hasn’t been acknowledged by a majority of the nodes.

Also, Change Stream have an API to restart the Change Stream where you stopped. Pretty convenient when you want to stop your app for a minute and then catch up from your last processed entry when you restart.

Change Stream also support filtering based on operation types (insert, update, replace, delete) and based on the fields (“name” field updated in collection mydb.mycoll?)

Here are the Change Stream main characteristics:

Resumable
Targeted Changes (filters)
Total ordering (based on the order of the items in the oplog so always in the real order)
Durability (only get write operations that have been saved to a majority of the nodes - can’t read something that might be rollbacked later)
Security (if your user can read the collection, they can watch() it - tailing the oplog need admin access to system collections).
Ease of use (awesome API !)
Idempotence (because everything in the oplog is so everything in a Change Stream also is).

I hope this helps!
Cheers,
Maxime.

PS: I hope I convinced you that tailing the oplog was already a bad idea in the past and still is ! Especially now that there is an actual well implemented solution.

Bonus: When I work locally, I only use MongoDB in an ephemeral Docker container like this - and of course also set up as a single node RS so I can use all the cool features:

alias mdb='docker run --rm -d -p 27017:27017 -h $(hostname) --name mongo mongo:5.0.3 --replSet=test && sleep 4 && docker exec mongo mongo --eval "rs.initiate();"'
alias msh='docker exec -it mongo mongosh --quiet'

If you PC is a bit slow, add a few seconds in the sleep :-).