What is the status of APM (In Mongo node.js native driver)?

Eddy_Wilson · March 12, 2021, 11:21am

Hello,

What is the status of the MongoDB native driver APM functionality on Node.js?
In the 3.6 docs, you can find it here, but it no longer appears in the docs for 4.x. Will it be removed?

Also, the docs for APM in 3.6 seem very wrong. For instance, there is this snippet of code:

const listener = require('mongodb').instrument({
  operationIdGenerator: {
    operationId: 1,

    next: function() {
      return this.operationId++;
    }
  },

  timestampGenerator: {
    current: function() {
      return new Date().getTime();
    },

    duration: function(start, end) {
      return end - start;
    }
  }  
}, function(err, instrumentations) {
  // Instrument the driver  
});

But if you check at the driver code here, the options argument isn’t used at all.

Also, the docs show that you can iterate over the instrumentations argument … but that isn’t the case because instrumentations is the Instrumentation Class instance which extends an EventEmitter which doesn’t have a forEach.

The code in 4.0 branch is the same but written in TS. Is this a work in progress? can we reliably use this? I’m currently using this for monitoring but I’m not quite sure if this APM will continue to exist in next versions. The alternative is overwrite prototype of internal MongoDB classes to get these commands which is something I want to avoid doing (but it’s something that some monitoring services actually do).

neal · March 16, 2021, 10:17pm

Hello @Eddy_Wilson!
Rest assured the status of APM in the node driver is that the functionality is fully supported and will be going forward. However, it does seem we have some confusing documentation that hasn’t made clear what is legacy and what is best practice. I have filed a ticket for this and we will update the docs as soon as we can.

Today all drivers share the same monitoring events for a uniform database experience. In node it would like something like this:

const client = new MongoClient(URL, options)
client.on('commandStarted', event => {/* event handling... */})
client.on('commandSucceeded', event => {/* event handling... */})
client.on('commandFailed', event => {/* event handling... */})
// SDAM events can also be listened to in the same way

await client.connect()
// etc..

In 4.0 the instrument method will likely be removed and it will be deprecated in 3.6. However, the 4.0 version is still an in progress beta so if you are targeting production I would hold off on deploying with the beta version of the driver. The code above will still work in either version of the driver.

Hopefully this helps! Let me know if there’s anything still unclear.

Eddy_Wilson · March 17, 2021, 5:39pm

@neal cool, it makes sense.

One thing though, so there isn’t a way to provide these instrument options, right? such as operationIdGenerator and timestampGenerator?

Right now, I’m logging requestId for slow queries. However, with ~200 node.js servers, this requestId isn’t unique. Is there a non-hacky way to provide these options to the driver?

For instance, for timestampGenerator, this pseudo-code is how I implemented it:

const DurationStart = new Map();
on("commandStarted", event => DurationStart.set(event.requestId, process.hrtime.bigint()));
on("commandSucceeded", event => {
  const diff = process.hrtime.bigint() - DurationStart.get(event.requestId)
  // do something with diff
  DurationStart.delete(event.requestId)
});
on("commandFailed", event => DurationStart.delete(event.requestId));

While this works, it’s pretty hacky. I need a timestampGenerator because ms duration in the event isn’t accurate enough (seems rounded, e.g: duration 7ms may be 6.65ms) and we actually need metrics in 4 digits duration (e.g: 1ms is 1000)

And it’s pretty similar thing for a operationIdGenerator to map a requestId to a, for example, uuid/v4 random id. The method above works, but it’s error prone, if there is an error on the event or I forgot to remove it from the Set, then we have a memory leak.

Lastly, the instrumentation points won’t be available either in 4.x. Will it? The class description with the methods and options (e.g: callback: true, promise: true). I haven’t seen it in MongoDB native driver for Node.js codebase.

neal · March 18, 2021, 7:04pm

I forgot to note in my first reply that you’ll also need to enable monitoring via the constructor like so:

const client = new MongoClient(url, { monitorCommands: true })

Firstly, maybe attaching a comment to your find operations that is some unique string can help with logging/tracing. The requestId is a monotonically increasing value that is an int32 and part of the underlying communication protocol between a driver and mongod. Combining requestId with some unique process information (maybe, pid + hostname) should give you unique values per nodejs server. Additionally, if you want a detailed view into why a query may be slow, I recommend taking a look at the explain API which can be run from the mongo shell as well as the nodejs driver.

As of now it appears we do lack support for the specific use case of assigning a custom id to every operation and writing custom timestamp generation logic with higher precision than ms. However, we would be open to adding support for such a use case if the above still doesn’t achieve the functionality you’re searching for.

Eddy_Wilson · March 19, 2021, 11:26am

Oh, thanks, so that’s what monitorCommands option does.

I do actually have an implementation on top of apm that does explain for all these slow find commands that get logged in commandSucceeded event during development.

However, we would be open to adding support for such a use case if the above still doesn’t achieve the functionality you’re searching for.

It’d be great to have a built-in way to generate a requestId and duration. The way we do it is by subscribing to all 3 events and keeping each commandStarted event requestId and duration in a Map so we can identify the uuid assigned to requestId or diff the duration in later events commandSucceeded or commandFailed. It works well but seems very hacky and makes the application prone to memory leaks (like forgetting to remove the requestId from the Map on commandFailed)