Best practice to set up a (small) staging environment

michael_hoeller · November 19, 2020, 4:05pm

Hello

how do you setup a staging environment? Ok, this is a very open question. Let’s assume we have a giant database in Atlas. The simplest way to get a staging environment would be to just copy the db in Atlas. If money is not issue we are done at that point.

If cost reduction is an issue we need an other solution. This is actually a decades old problem and every time it has hit me it was either copy as mentioned above or start scripting to get a kind of consistent subset of data in a smaller database.

A further option could be to use some old local hardware and restore a full dump out of Atlas locally. Not smart, would work but you still need the hardware…

So back to the part were a consistent subset is copied to a new database… Before reinventing the wheel I like to find out if there is some best practice out there? Consistency depends on relations, relations are not physically stored in a noSQL schema, so IMHO there is no way to use any kind of automation to get related data consistent and automatically as an dump. So we are back to a scripted solution?

How do you approach this issue ?

Cheers,
Michael

Wolf_Scott · September 12, 2021, 8:54pm

I just joined the community and also notice that this question is pretty old, but perhaps this is a helpful thought. Also, this is what “I” do and your mileage may vary.

By staging, I am reading that as having a working environment that duplicates the production site.

I have copied the production DB in the past for some deep troubleshooting, BUT, that’s quite painful as I’m sure many have used it as a last resort and you pointed out.

I like to stuff in “edge” case data that helps me in the dev/testing process to ensure I’m covering those cases, which may even rarely occur in production. This tends NOT to be good for overall performance testing but sure does help when I’m trying to make sure things don’t fly off the rails. I’ll likely have 1 client/case/log value where I do stuff in considerable data to ensure all my paging, filters, etc, do the right thing. Having a more consistent, predictable, data set helps to quickly see when things are not lining up as I reuse this data consistently and thus, can reasonably predict what “should” be showing up.

just my 2 cents for a workflow I use –

cheers,
\//\//olf

michael_hoeller · September 12, 2021, 10:10pm

Hello @Wolf_Scott welcome to the Community
I ended up in a kind rudimentary but robust process:

dump a subset of data from production with mongodump (filter with –queryFile
restore to a new DB with mongorestore (rename with --nsFrom --nsTo)

This can be completely scripted, the magic is if you can define consistent queries to create a subset)
If you want to run this in a CI/CD Pipeline you may want to use the MongoDB Atlas API, mongocli.

Regards,
Michael