Syncing "big" documents

Timo_Land · September 30, 2021, 11:16am

Hello,

I am having problems to sync documents which contain timeseries data. Using the bucket pattern, I store the data for one hour in one document respectively. One document contains about 1MB of data.

This data is parsed from files, so it is not rare that a new document containing the data of one full hour is added and not filled timestamp by timestamp over time.

When syncing one such document, the cluster seems to be overloaded by the process of translating the Realm objects into Atlas documents. After adding one document to a synced Realm, the primary node starts to insert the data, but restarts after a while. The primary node is reassigned multiple times. During this time I cannot access the “Browse Collections” UI.

The cluster continues to work for hours, but I cannot find the document in the collection. The number of documents and size of the collection in the “Browse Collections” UI grows, but the documents are not shown. They are also not retrievable via the mongodb shell or Compass.
The realm logs don’t show any errors when syncing the data, but afterwards occasionally a MaxIntegrationsAttempsError is logged.

I am using the Free Tier M0 cluster.

Are 1MB documents too big to be added to a synced Realm at once? Should I split the document into smaller buckets or add the bucket first and fill it over time? Am I missing something else?

Thank you in advance!

MaBeuLux88_xxx · September 30, 2021, 11:50am

Hi @Timo_Land and welcome in the MongoDB Community !

M0 clusters have a number of limitations that you may have reached easily with such large documents. On top of that list, there is the storage limit of 512MB (oplog included as far as I know) but there are more!

Just to mention a few that you should investigate first:

Data Transfer Limits M0/M2/M5 clusters limit the total data transferred into or out of the cluster in a rolling seven-day period. This varies by cluster tier as follows:
- M0 : 10 GB in and 10 GB out per period
- M2 : 20 GB in and 20 GB out per period
- M5 : 50 GB in and 50 GB out per period
Throughput Maximum operations:
- M0 : 100 per second
- M2 : 200 per second
- M5 : 500 per second
M0 free clusters and M2/M5 shared clusters are allowed a maximum of 500 connections.

Maybe you have reached one of these limitations and this is what is causing your issue.

I’d try an M10 cluster, just to make sure this isn’t happening because of the M0 limitations before trying something else.

Cheers,
Maxime.

Timo_Land · September 30, 2021, 1:51pm

Hey @MaBeuLux88_xxx,

thank you for your answer. From the limits you mentioned, the maximum operations were probably the biggest problem.

I upgraded to M10 and tried to add one bucket document again. The node restarts don’t occur anymore, but the cluster is busy since one hour again and I cannot find the document in the collection.

I am quiet surprised that adding one document of 1MB requires so much work. Would shrinking the time window for each bucket or even creating one document for each timestamp increase the write performance?

Timo_Land · September 30, 2021, 2:38pm

After some time I received a MaxIntegrationAttemptsError (TransactionExceededLifetimeLimitSeconds) in the Realm log. So something is still not right. I have no idea what.

Timo_Land · October 1, 2021, 11:01am

I ran some experiments and here is what I found out:
The size of the transaction seems to be what causes the problem.
Shrinking the size of each bucket (for example one bucket for each minute) and adding the buckets in one big transaction does not work. However, adding each bucket in a separate transaction with a short delay between transactions results in the documents being written to the Atlas cluster within a short period.

I have yet to try whether having one big bucket and filling the bucket over multiple transactions with data would work as well.

system · October 16, 2021, 12:36am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.