MongoDb archival queries

My team is exploring some solutions for mongoDb data archival and online archive seems a great option for it. But before making a conclusive decision I have some queries, would love to get some response/feedback from an expert.

  1. In limitations it says at max 50 OAs can be created per cluster and out of those 20 will be active at a time. Can you explain this in bit detail? Does that mean if someone does data retrival on those inactive(30 OAs) the query will fail or result in inaccurate data? Also it this limit customisable, can we request for more active OAs or More OA creation as per the requirements ?
  2. How much latency we can expect in data retrieval ? if it is minimal then its fine.
  3. Also can we configure online archival for a collection while defining the schema at codebase level ? I mean is there any way I can provide a field mentioning make a archival for this collection with specified rules.
  4. Can you please provide some more detailed documentation on working with OAs through CLI and APIs, and how can I setup OAs for multiple collections within a cluster with exactly same rules, using some script or something else, at application startup.
  5. In notes multiple times its mentioned that it will only archive data if it more that 5MiB for last 7 days, what exactly does this mean ? Like suppose my rule is set for 60 days, and we don’t have more that 5MiB data in this cycle, so will this data be archived in next cycle ?
  6. What are the best practices to manage archived data, eventually it can grow and become huge at a time, also deleting this archived data is not an option. Do we have something to solve this ? like may be partitioning 1 OA into multiples etc.

Lots of questions :smile: , but this will help us to be clear and accurate regarding the solution which can be used in long terms.

  1. This limit applies per cluster, allowing a maximum of 50 online archives, of which up to 20 can be active at any given time. The remaining 30 must be in a paused state.

A paused archive will not continue archiving data from the cluster; only the 20 active online archives will transfer data to cloud object storage. However, queries executed on a paused archive will still return results without failure.

While this limit can be adjusted, please note that online archives utilize the cluster’s underlying resources. Increasing the limit may impact cluster performance.

  1. Queries on OA would be slower as compared to queries in the cluster. It would introduce latency in the query results to a few seconds
  2. Does not have such option.
  3. We do not have a pre-developed automated script to accomplish this. You will need to use the Atlas API or CLI to create one as per your requirements.
  1. For 7 days immediately after Atlas creates an archive, Atlas archives all data. After 7 days, Atlas archives data only when your data size reaches 5 MiB. If the data size is below 5MB after 5 days it won’t be archived.
  2. You have the option to delete archived data by setting the deletion age limit. Additionally, you can pause an individual archive and create a new one if it grows significantly. Some customers manage archives as large as 10TB without experiencing any issues.

Thanks for clarifying the queries.

1 Like