Atlas archiving

Hello everyone, our team currently uses atlas as main db and we have a lot of useless data. I am considering a different options and have some questions:

  1. Do I understand correctly that using the online archive approach will be cheaper than the approach where we move useless data to a cold cluster?
  2. We already have a huge amount of useless data, would it be better if we run some process that archives it first and then set up an scheduled online archive?
  3. Does online archive uses S3 Glacier or does it just use default S3 storage?

Hi @Bohdan_Chystiakov ,

Great questions ! Here are the responses:

  1. Yes, Online Archive is easier and cheaper as archives are fully managed so you don’t need to configure and maintain separate cloud object storage. You can create date-based or custom archival rules that run indefinitely so you don’t have to manually migrate or delete data. In addition, your archives are online and easily accessible alongside your Atlas cluster data.
  2. Great question ! For the first time archival, it is better to run for more time and if you can dedicate exclusive time, it is better to run 24 hours first and then setup scheduled archives. When you are running for the first time someone on your cluster and you might want to archive a massive chunk of data. OA speed limit is 2GB/5 mins, that is around 0.5 TB/day. So, if you have 2 TBs of data to archive, you can let the archive run continuously for 24 hours and it will archive in about 4 days. After the first time archival, your daily archival workloads can be scheduled to run during off peak hours with the “Scheduled archiving” feature.
  3. Online Archive uses the standard S3 storage (not Glacier).
2 Likes

Here are a few additional questions, if I may:

  1. Is it possible to set up our own S3 bucket and use it with Online Archive approach or is it completely managed by Atlas?
  2. If we want to abandon Atlas, can we keep the archive data?
  3. Is there any way to return the data from Online Archive to the old cluster?

Hi @Bohdan_Chystiakov

  1. Online Archive is completely managed by Atlas. If you want to setup your own S3 bucket and query it, you can definitely do this using Data Federation : Atlas Data Federation | MongoDB
  2. No, Online Archive is tied to Atlas
  3. Yes, if you want to restore the data back from Online Archive to your cluster, you can use the $merge : https://www.mongodb.com/docs/atlas/online-archive/restore-archived-data/
1 Like