Is querying Online Archive as fast as normal db?

Zarif_Alimov · December 7, 2021, 6:56pm

If I have a collection with fields A, B and C indexed, will those indexes be preserved when data gets chopped off and moved to Online Archive? In other words, if I later query the data in Online Archive by those fields will the queries be just as fast as if it were in the original db?

steevej · December 7, 2021, 7:19pm

Short answer: absolutely not

Longer answer. Think about it. If they could make it as fast, normal db with the complexity of tree based indexes, working set in RAM with cache, would be useless.

MaBeuLux88_xxx · December 8, 2021, 2:16am

Hi @Zarif_Alimov,

I’m seconding @steevej: Yes, it will be slower, that why it’s called “cold data” and “archive” but with federated queries, it’s, indeed, possible to query both data sources with a single endpoint.

Cold data == only on S3 == Hard Drives.
Hot data == Atlas cluster == RAM + CPU + Hard Drives.

When you setup your Online Archive though, setting up the partitions is the critical step that you don’t want to miss if you want to keep some decent performances.

Because of these partitions, the data in S3 will be stored in folders so if your query contains these fields, the query resolver can eliminate quickly many many “cold” files and find the right documents WAY faster instead of scanning everything.

Cheers,
Maxime.

steevej · December 8, 2021, 1:45pm

Thanks for links Maxime. However the one about Online Archive gives me:

404 Not Found

    Code: NoSuchKey
    Message: The specified key does not exist.
    Key: online-archive
    RequestId: HJDNS7FBJK3BNCV7
    HostId: zdYTTiasTSMSlnCRSsXfHBzijVLRwGyw9iTsb7xpIPE30poBzgNB3U3rb5H52bi7p0V2jzWPVYo=

There is a lot of things that I don’t know B-(. The concept of partitions is very nice.

MaBeuLux88_xxx · December 13, 2021, 4:58pm

Oops.
I fixed the link!