I’m porting some old aggregation scripts that used ScopedThread from parallelTester.js to split mapReduce operations to multiple threads to make more effective use of the available cores. The scripts are run directly on the mongo shell.
I have rewritten that threading part with NodeJS Worker; I am hitting a roadblock there, though, as the threads spawned in that way don’t inherit any of the mongosh environment. So neither db nor connect() are available. I cannot pass db as workerData either, as this cannot be cloned. If I install mondogb via npm, I can require it on the MainThread, but the children cannot find any npm modules (tried npm install with and without -g).
I’m just about to give up and rewire the whole thing so as to not try to run in parallel threads at all, but I was hoping that someone here might have has some success using Worker inside mongosh?
Here’s some mockup code as an example:
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');
const {MongoClient} = require('mongodb');
const uri = "mongodb+srv://localhost/mydb?retryWrites=true";
bar = 'global info';
if (isMainThread) {
const threads = new Set();
threads.add(new Worker(__filename, { workerData: { foo: '1', bar }}));
threads.add(new Worker(__filename, { workerData: { foo: '2', bar }}));
for (let worker of threads) {
worker.on('error', (err) => { throw err; });
worker.on('exit', () => {
threads.delete(worker);
if (threads.size === 0) {
console.log('done');
}
});
worker.on('message', (msg) => {
console.log('Worker Message: ', msg);
});
}
} else {
async function main(){
const client = new MongoClient(uri);
}
main().catch(console.error);
parentPort.postMessage(workerData);
}
Running this yields two error messages, one for each of the threads:
mongosh --quiet mydb ./testWorkerThreads.js
Uncaught:
Error: Cannot find module 'mongodb'
Require stack:
- ./testWorkerThreads.js
Uncaught:
Error: Cannot find module 'mongodb'
Require stack:
- ./testWorkerThreads.js
done
Using “db” inside the threads (which I’d prefer, as I wouldn’t need new connections) gets me a similar result - db is not defined.
Any ideas?