How to optimize Realm Sync for performance

Christian_Huck · November 8, 2020, 11:27pm

Our app used realm cloud before for a live synchronization feature for realtime collaboration such as depicted in this early realm image video.. Unfortunately we realized that the speeds we can achieve with a M10 cluster in our region over the networks are far worse than the experience we had before with Realm. e.g. synchronizing the typing of a text took 1 second to be synced from one client to another via realm cloud. Now with MongoDB Realm, we experience delays of about 10 seconds as well as inconsistencies in the time, even on a cluster with no load.

Regarding this issue I have the following questions:

Are there plans to optimize this or is it in the nature of the replicated mongodb clusters, that syncs will take their time to arrive?
What steps could an app developer do to optimize, e.g. only sync/write to the database after n keystrokes/points drawn.
How will frequent writes, e.g. updating a string on every keystroke (e.g. “H, He, Hel, Hell, Hello” in a short timespan. affect the size of the database ( that includes history)

Ian_Ward · November 9, 2020, 12:52am

@Christian_Huck Thank you for your post. We realize that the sync performance right now leaves much to be desired, which is why this quarter’s product plan is almost exclusively dedicated to improving sync performance, its also why we are still flagging Realm Sync as Beta. Our initial implementation focused on correctness and bug fixing, and now that the sync stability has improved we are changing gears to performance. The underlying architecture of the new Realm Sync is such that we intended to not only meet, but far exceed performance in the legacy Realm Cloud - the legacy Realm Sync architecture had backed ourself into a corner with some of our decisions, and would have necessitated a large refactor anyway.

That being said, even with the planned performance improvements, there is always going to be best practices that the user should follow in order to increase throughput. One is to always batch writes as much as possible on the client, if your client is writing a lot of data, then we recommend a rule of thumb of 10k objects per transaction. Another is to try to not store a lot of blob or binary data in realm and replicate via sync - this is because sync works using an operations log, essentially it is a copy of the state, one for actual state and one for the operation + payload. You could imagine that an implementation that simply inserted and deleted a 1MB photo over and over again would blow up your realm sync history log, reducing performance. The last thing to keep in mind, is that if your app has a shared writable data between groups of users you should design your partitioning strategy to maximize throughput. This is because OT, our conflict resolution algorithm, can generally be the largest factor when it comes to throughput, and this algorithm is applied per partition. The more users you have writing to a partition, the longer these users have been offline, and the amount of changes they have accred locally before syncing to the server-side are all variables which will decrease performance. If you are running into this then you should look to create more partitions and reduce the amount of users per-partition.

I’m not sure on your specific use case but feel free to open up a support ticket and we can look at your particular use case and implementation and either make suggestions on how to improve performance now, or make sure that your specific use case will be helped by future work we are doing on the sync server.

As an example, we recently had a support case where a user was collecting sensor data every 100ms and wanted this data to replicate to another sync user in realtime. However, the developer was performing a write operation every 100ms (per-sensor reading) which caused the user receiving the data, their sync time would continuously lag behind, and the lag time would increase the longer the sync write user continued to write. What we did was instead batch those sensor readings to perform a sync write operation on the client around every 2-3 seconds. This enabled the receiving sync user to receive the data in constant time - there was no longer any steadily increasing lag time. I hope this helps

Christian_Huck · December 1, 2020, 10:34am

@Ian_Ward thanks for the uodate. Meanwhile, moving to a production environment, our speed has improved. One thing we did is migrating from mutable collection sets to inverse relationships (Dog->Owner instead of Owner -> dogs).

I will open a support ticket still to see if we can improve our speed further and optimize our writes.

system · December 6, 2020, 10:34am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.