Run Cluster on Spot instances?

Hello there,
I recently discovered a very nice project of mongodb: mongodb-kubernetes-operator which allows to run mongodb cluster on own kubernetes cluster.
For most of my workload I’m using spot-instances which are pricing with 60-90% of discount, but they are preemptible and can be terminated at any point of time. Since web services are stateless this is not a problem for us, but just a requirement to have at least 2 nodes and 2 pods distributed among them (which is anyway a requirement for any production workload).
Now I’m wondering is it makes sense to run MongoDB also on spot instances: just from the theory it should work, when the node (primary or secondary) is terminated it shouldn’t cause any downtime as the load will be transferred.
But I’m not sure how much data might be corrupted (of course the ssd disks are permanent for these nodes).

Are there any insights or case studies available, or does anyone experience similar setup, would be interesting to learn.

I honestly don’t know enough about spot instances to know how feasible that would be! Very interesting question though…!!

If you lose a node and a pod, what happens to the storage? Will the pod come up again somewhere else with the storage (theoretically) intact? (I’m not sure if that’s what you meant by the SSDs being permanent)

I’m definitely not aware of anyone else running on spot instances!

Hi Dan,
I also like the idea and I think it worths to give a shot :slight_smile:

The storage is provided by the cloud provider (GCP in my case), and they are independent of kubernetes, instead they are mounted to nodes and can be used by its’ pods.
So yes, disks are permanent in a sense that they survive node/pod terminations.

Hmm… should work then!

I can’t advise on the disk corruption… but I would imagine that this scenario is no different to hardware or VMs, where you might lose one of the replica set members and then bring it back up.

If you try it, let us know how you get on!

@Dan_Mckean switched production traffic today morning, fingers crossed :smiley:
we have around 250-300 rps, and around 50GB of data at the moment
I set 3 spot instances c2-standard-4 (4vCPU, 16GB RAM) for the cluster

Nice!! Good luck, let us know how it goes and we can mention that it works (:crossed_fingers:) and give you an honourable mention for being the first to try it! :grin:

So for last 2 weeks only one node had been restarted, this wasn’t a primary, so there was no effect on system. I also tried to restart primary manually, it didn’t cause any downtime, and in fact even the avg. response time, error rate and other vital metrics were not affected. We are able to serve more traffic faster and this positively affected our revenue.
So I believe this setup could work. Let’s give it another couple of weeks.

1 Like

Great news! Thanks for the update - I’ll keep my fingers cross that the experience remains positive!

Hi Dan! Only positive results so far. After half a year! Just yesterday I increased oplog size on my mongodb and restarted database in the midst of peak traffic, no downtime, no problems. It’s so awesome to have mongodb up and running, and spot instances fit just fine:
Driver, mongos and mongodb - together they make it unnoticeable any server termination as another node quickly becomes primary and the system is available for writes in no time.
Here my short LinkedIn post: Sergey Zelenov on LinkedIn: #mongodb #googlecloud #kubernetes #costefficiency

Would you mind to add a reaction, I’d love to share our experience with other mongodb users :slight_smile:

1 Like

Thanks for replying here! We had actually spotted your linked in post already, but I’d not gotten to replying!

So awesome to hear it’s been so successful for you!