MongoDB equivalent of Stored Procedures

Xavier_Robitaille · November 2, 2021, 1:43pm

Hi,

I am wondering what is the MongoDB equivalent of Stored Procedures.

All my searches seem to point server side javascript functions, but there doesn’t seem to be much documentation on those functions (and there’s a big disclaimer in your docs that seems to imply that MongoDB does not recommend using them).

What is the recommended MongoDB equivalent?

Thanks,
Xavier

Search for “Stored Procedure” in MongoDB docs

https://docs.mongodb.com/manual/tutorial/store-javascript-function-on-server/ (2nd highest ranking page)

Search for “Stored Procedure” on Stack Overflow:

MongoDB Stored Procedure Equivalent - Stack Overflow

FYI here’s one of my “stored procedures”

db.system.js.save({ _id : "getUser" , value : function (emailAddress){ return db.users.aggregate([ {$match: {email: emailAddress}}, { $lookup: { from: "activitiesbuckets", localField: "_id", foreignField: "user", as: "activities" } } ]); }

MaBeuLux88_xxx · November 2, 2021, 7:55pm

Hi @Xavier_Robitaille and welcome in the MongoDB Community !

Pure “stored procedures” don’t exist in MongoDB but we have a couple of alternatives that are way easier to use in my opinion.

In my distant past, I remember some databases filled with PL/SQL that only one person could maintain (that left 2 years ago). This garbage code wasn’t saved nor versioned anywhere… This still haunts me to this day.

Anyway… First alternative: The MongoDB Aggregation Pipeline which is very powerful and ─ usually ─ underused / overlooked.

If we take the example from the StackOverflow post, this could be solved with a single aggregation pipeline. The stores in the CSV file could be loaded in a collection without the latitude and longitude. Then we could run an aggregation pipeline that would roughly look like this:

$match stores docs without lat & long.
$lookup zip codes with the reference collection that contains the lat & long.
$project shape the docs for the next stage.
$merge to merge back into the original collection the “enriched” documents.

This aggregation is fully executed on the database and can be very fast if it’s backed by the right indexes.
The final algorithm in the back end would look like some like this:

parse the CSV file
insertMany the stores in the stores collection.
run the final aggregation pipeline to enrich all the inserted docs with the lat & long from the reference collection.

Note that we could ─ potentially ─ wrap this entire process in a multi-docs ACID transaction, but $merge can’t be part of an aggregation pipeline inside a transaction. Also note that using stored procedures in some SQL system wouldn’t be ACID either. But feel free to vote here to make this happen: SERVER-45209.

Second solution: use Change Streams directly, or use their “serverless production ready” equivalent in the MongoDB Cloud: Realm Triggers.

With Change Streams, you need to setup your own back end server(s) and use one of the MongoDB driver available to listen to the changes you are after and trigger some code each time it happens. The big difference: the code isn’t executed on the database side. But I see this as an opportunity rather than a constraint. For example, it’s now trivial to interact with other 3rd party services (external APIs, AWS S3, you name it, …). Good luck to achieve this with PL/SQL. Honestly, I haven’t touched this tech for a while now, but at beast this would be a nightmare at best, if possible at all.

Let’s take a simple example: Users can insert a document in my collection that contains a movie title. On inserts, I want to fetch movie details using a REST API and enrich the document in place.

Here is how this would look like in a MongoDB Realm Function that is triggered by a Database Trigger on inserts.

exports = function(changeEvent) {
  const docId = changeEvent.documentKey._id;
  const title = encodeURIComponent(changeEvent.fullDocument.Title.trim());

  const movies = context.services.get("mongodb-atlas").db("realm").collection("movies");
  const apikey = context.values.get("imdb_api_key");
  const imdb_url = "http://www.omdbapi.com/?apikey=" + apikey + "&t=" + title;
  console.log("Title : " + title);

  return context.http
    .get({ url: imdb_url })
    .then(resp => {
      console.log(resp.body.text());
      var doc = EJSON.parse(resp.body.text());
      if (doc.Response == "False") {
        movies.deleteOne({"_id":docId});
      } else {
        doc.DVD_ISO = context.functions.execute("to_iso_date", doc.DVD);
        doc.Released_ISO = context.functions.execute("to_iso_date", doc.Released);
        movies.updateOne({"_id":docId}, {$set: doc});
      }
    })
    .catch(e => {
      console.log(e);
      movies.deleteOne({"_id":docId});
    });
};

Using this method, I could also solve the earlier CSV + geoloc data problem - but instead of using data from a REST API, I could query my Atlas cluster directly and then enrich my documents with the same method.

Change Streams & Realm Triggers can also be used to push data to other systems like AWS Kinesis or Kafka. Good luck to do that with PL/SQL.

I hope this help!
Cheers,
Maxime.

Xavier_Robitaille · November 5, 2021, 12:33am

Hi Maxime,

Thank you so much for your very detailed answer. I think it will become the reference post on this topic!

I agree with you regarding the Aggregation Pipeline – that’s actually what I was making use of in the example “Stored Procedure” / “Server-side Javascript Function” I posted (sorry it wasn’t well formatted at all in my first post):

db.system.js.save({ 
  _id : "getUser" , 
  value : function (emailAddress) { 
    return db.users.aggregate([ 
      { $match: {email: emailAddress} }, 
      { $lookup: { from: "activitiesbuckets", localField: "_id", foreignField: "user", as: "activities" } } 
    ]); 
  }
)}

I agree with your general assessment that it is not a good idea to put complex logic in code that’s stored in the database. However, I do feel like “database server-side functions” have their role to play.

For example, I like encapsulating the aggregate pipeline of my above query in the getUser function. I can call it manually from many different client applications (including when I ssh on my MongoDB server to troubleshoot issues). I don’t have to worry about whether I have typed the query correctly, etc.

I guess I was surprised there isn’t more official Mongo doc on the topic. For example:

what is the runtime environment?
where to find all the code that’s currently stored in the database - it took a bit of fiddling around to understand that system.js looks like any other collection: db.system.js.find() lists all the functions, etc.
I agree that such code should be source controlled. (perhaps mongo could give pointers on clever way to integrate source control ?)

Thanks for mentioning Change Streams. I have to admit I have never used that technology (I had to read up), but it does seem powerful indeed.

The moral of the story for me seems to be:

use database-side javascript function/queries sparingly
if the logic is complex and triggered by real-time data changes, try to see if Change Streams can help (I can see some use cases where Change Streams could provide a cleaner implementation than e.g. an AWS lambda).

Thanks again for the very detailed explanations. They were much appreciated.

Regards,
Xavier Robitaille
Feather Finance

system · November 10, 2021, 12:33am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.