My team is exploring some solutions for mongoDb data archival and online archive seems a great option for it. But before making a conclusive decision I have some queries, would love to get some response/feedback from an expert.
- In limitations it says at max 50 OAs can be created per cluster and out of those 20 will be active at a time. Can you explain this in bit detail? Does that mean if someone does data retrival on those inactive(30 OAs) the query will fail or result in inaccurate data? Also it this limit customisable, can we request for more active OAs or More OA creation as per the requirements ?
- How much latency we can expect in data retrieval ? if it is minimal then its fine.
- Also can we configure online archival for a collection while defining the schema at codebase level ? I mean is there any way I can provide a field mentioning make a archival for this collection with specified rules.
- Can you please provide some more detailed documentation on working with OAs through CLI and APIs, and how can I setup OAs for multiple collections within a cluster with exactly same rules, using some script or something else, at application startup.
- In notes multiple times its mentioned that it will only archive data if it more that 5MiB for last 7 days, what exactly does this mean ? Like suppose my rule is set for 60 days, and we don’t have more that 5MiB data in this cycle, so will this data be archived in next cycle ?
- What are the best practices to manage archived data, eventually it can grow and become huge at a time, also deleting this archived data is not an option. Do we have something to solve this ? like may be partitioning 1 OA into multiples etc.
Lots of questions , but this will help us to be clear and accurate regarding the solution which can be used in long terms.