I have a large M80 database and I’m planning to split its data into HOT / COLD storage tiers. Using s3 buckets to store the majority of the data that is not frequently accessed and data federation to query against atlas and s3 at once.
My currently doubt is, how to define partitions, indexes or whatever to avoid all the s3 data to be scanned. All the exemples I found are simple and I couldn’t figure out yet how the s3 reading works.
My name is Ben Flast, I’m the product manager for Atlas Data Federation. We do have guidance about how to use paths and what to keep in mind when partitioning to optimize performance at the links below. But partitioning can definitely be a challenging task, and guidance can differ based on your underlying data and the query patterns you plan to use. If you’d like, please throw some time on my calendar and we can discuss in detail: Calendly - Benjamin Flast