How does the MongoDB Data API work from a high-level perspective?

A friend and I talked about building a low latency global service, using Cloudflare workers and MongoDB.

Let’s say we have a cluster on M30 with global low-latency reads configured.

If we now follow the guidelines described in this blog on setting up Ream, how is the architecture going to be?

If a user in Argentina makes a GET request, the nearest Cloudflare data center will respond and call the Realm API. But where is the Realm API physically located - is it also deployed globally?

Thanks for your input.

Hey @Alex_Bjorlig,

I’m the author of that blog post. :slight_smile:
This doc is important for my answer.

MongoDB Realm apps are deployed either globally or locally. If they are deployed globally, your app is available in Ireland, Oregon, Sydney and Virginia.

So if a client is in New York, he will likely be rooted to the app in Virginia, then depending on your configuration, it will reach to the Primary node for a write or to the closest node if you are reading with readPreference nearest (assuming you have a Replica Set. A Sharded cluster would cover even more ground). And this without Cloudflare. Only Realm auth + Realm functions (which are equivalent to Cloudflare workers).

Now the problem with my Cloudflare blog post is that we omitted to talk about caching the MongoDB Connections. This blog post was more a proof of concept rather than a production ready code sample.

In MongoDB Realm, connections to the underlying cluster is cached. Each time a Realm function needs to connect to the MongoDB cluster, the same client connection is re-used. This avoid the handshake and the cost of creating and maintaining a connection for the replica set EACH TIME we make a query. When you run a query, you just want to run the query and access MongoDB, not initialise everything from scratch each time.

It’s a bit what’s happening with my Cloudflare code. Maybe there is a way to cache the connection with Cloudflare, but I don’t know enough about Cloudflare to do so.

It’s the same thing for AWS Lambdas. You have to cache the connection in a global variable that is reused by all the other lambdas.

Cloudflare is an extra layer in between your client and your MongoDB cluster in Atlas that isn’t necessary really. It’s an extra step in the network as well.

The best scenario would be to have a locally deployed Realm app in Dublin, Ireland and an Atlas cluster also deployed in Dublin. When you execute your Realm Function, it can access the cluster next door very fast without rooting the query around the world twice.

Cheers,
Maxime.

1 Like

I don’t think that’s correct? In my perspective, the business rules, authentication, and authorization would live here. But that’s probably a separate discussion.

The way I view Cloudflare workers is the future of serverless. It’s more global and features 0ms cold-start → a significant improvement compared to serverless functions. Now the question is how to combine this with an equally distributed database. I believe this is exactly why Cloudflare announced their D1 SQL database in September 2022.

So my question could essentially be rephrased like this:

Is MongoDB+Realm a good fit for Cloudflare workers, or would it make more sense to use Cloudflare D1?

On a side note, could you share some insights as to why it’s not possible to use the node.js MongoDB driver? Would it ever be possible, or is the V8 environment never going to be compatible?

1 Like

I totally agree with that. What I meant to say is that MongoDB Realm App (Atlas App Service soon - MongoDB is renaming them) is already capable of handling this serverless workload.

You can achieve the same result without Cloudflare entirely and replace the Cloudflare workers by Realm Functions (Atlas Functions soon). The difference is that Realm Functions have a built-in cache mechanism that handles the connection to the Atlas cluster for you.

With MongoDB Realm you can handle the Rules, Schemas, App Users & Auth, GraphQL API, Functions, Triggers, HTTPS Endpoints (=webhooks for REST API) and front-end hosting.

Or you could also use the Atlas Data API (== REST API) which can just be activated in one clic.

With Serverless functions (from anywhere) you JUST want to execute the actual code and remove anything that would make you waste time (like initialise a framework, initialise a connection to another server (like MongoDB…), start a JVM, etc).

With the implementation we did in the Cloudflare blog post, it works. Ok. But each call to this lambda/worker creates a brand new connection to the MongoDB Cluster (at least that’s my understanding) with it’s entire connection pool, etc. This is like a cold start and it’s also like a DDOS attack from the MDB cluster perspective given that you are not executing this only 3 times a day of course. A MongoDB cluster can only sustain a certain number of connections (for an M30 it’s 3000) and it costs memory to the cluster to open and close them. Not counting the network TCP handshakes.

Realm Functions access the MongoDB cluster like this:

  const mongodb = context.services.get("mongodb-atlas");
  const movies = mongodb.db("stitch").collection("movies");

This built-in context act as a cache that keeps a pool of MongoDB connections available for the serverless functions to use when they need it. No need to repeat the handshake & auth each time I want to talk to MDB.

About Cloudflare D1, you just made me aware of its existence. So I have absolutely no idea what it’s worth. I just know it’s won’t scale like MongoDB does (because it’s SQL).

I think Cloudflare workers don’t support Tier Party Libraries (NPM) entirely (https://workers.cloudflare.com/works) and I think the MongoDB Node.js driver isn’t supported. I would have used that for the proof of concept / blog post. But I had to use this weird workaround with the Realm Web SDK (not really proud) that is supposed to be used in a front-end (not a back-end function)… But it’s the only solution I had to get a connection with MongoDB.

I hope it helps :slight_smile:.
Maxime

2 Likes

@MaBeuLux88 thanks for the excellent answer - possibly the best forum answer I got in a long time. My key takeaways are:

  1. MongoDB is not a great fit for Cloudflare workers until the connection cache issue is solved
  2. Realm Functions is better suited as an alternative to Cloudflare workers because there is a built-in connection cache.

Now that you mention Realm Functions as an alternative to Cloudflare workers, how do they compare?

  • What runtime do realm functions use - V8 or node?
  • What about cold-start issues?
  • Is it possible to configure API keys programmatically, using a service like Doppler?
  • Is it possible to forward the log, to something like Logtail/Papertrail?
  • Can we use Realm functions to answer with HTML responses?
  • Do you know of any efforts to make Realm functions work with Sveltekit (or similar frameworks?)

Thaaaanks - I know it’s a lot of answers, but reading through the documentation did not give me a clear indication if Realm functions is a direct alternative to Cloudflare/vercel/lamdba :sweat_smile:

I recently implemented a quick POC using Cloudflare Workers as the backend of a web app and had to connect it to a MongoDB Atlas cluster. Cloudflare Workers currently supports HTTP and WebSockets but not plain TCP sockets. For this reason, as @MaBeuLux88 points out, the MongoDB Node driver is not supported. That being said, Cloudflare seems to be working on this limitation. Some workarounds to connect to a MongoDB cluster from Cloudflare Workers include:

  1. Using the Realm client SDK (as explained in the blog post).
  2. Using a database proxy (like Prisma).

When using Realm, it seems that the blog implementation does not create a new connection with each request. This post suggests that Realm manages connections to Atlas automatically, depending on the requests made by client endpoints.

Hi Folks – Tackling a few of the latest questions on this thread. Note, we have recently renames MongoDB Realm (APIs, Triggers, and Sync) to ‘Atlas App Services’ to be clearer/more differentiated from the Realm SDKs.

  1. Functions use a custom JS runtime that most closely matches Node. It supports some features that Cloudflare workers don’t (ex. TCP connections) but not all modern JS Syntax.
  2. Generally speaking, Functions do not have cold calling/cold start costs.
  3. I’m not familiar with Doppler, but new API keys can be configured with the Admin API
  4. Yes, see our Log Forwarding
    5/6. I believe you’re basically getting at “Are Functions able to fully support SSR applications” – Functions are not a good fit for this, but it’s something we’re considering investing more in.

Finally @Sergio_50904 – on your connection management question – App Services essentially open connection pools between our hosts and your cluster and dynamically create/release connections based on demand. Connections can also be shared across multiple requests so you tend to open a more efficient number of connections at scale and pay the cost of opening a new connenction less frequently. This is true for all App Services (Sync/SDKs, Data API, GraphQL, Triggers).

2 Likes

Thanks for your excellent answers - especially the honest answers on 5 and 6 :+1: For now, I think we will stay with more proven technology. Also - it seems like Altas App Services is currently missing the possibility to run locally - something I think most developers would identify as a critical feature for development.

Thanks for the feedback Alex – We have designed our CLI to be interactive and make it easy to work alongside the cloud while developing locally or in CI/CD pipelines, but I do understand that some folks prefer a local/emulated environment for development. It is certainly another area that we’re considering!

1 Like

Yeah especially with MongoDB it makes sense to be able to run locally - since MongoDB is one of the few databases that will literally run everywhere :heart:
In our code architecture, we love to integration test against MongoDB running locally in-memory. It’s fast, makes for reliable tests, costs nothing, and almost emulates the production environment (looking at full-text search here :nerd_face:)