Insert to multiple tables with multithreading is inefficient

There are 500 million datas on my server, i need down and insert those datas into my realm db.
The following code takes me 26 minutes:

realmDB.beginTransaction();
        for (int i = 0; i < 5000000; i++) {
            QRBaseData qrBaseData = realmDB.createObject(QRBaseData.class, "https://www.baidu.com?uii=" + UUID.randomUUID());
            qrBaseData.setGroupId(232324);
            qrBaseData.setSkuId(23523);
            qrBaseData.setOutletId(33523);
        }
        realmDB.commitTransaction();

And the next code takes more time:

ExecutorService fixedThreadPool = Executors.newFixedThreadPool(3);
fixedThreadPool.execute(new Runnable() {
            @Override
            public void run() {
                DynamicRealm dynamicRealm = DynamicRealm.getInstance(configuration);
                dynamicRealm.beginTransaction();
                for (int i = 0; i < 2500000; i++) {
                    DynamicRealmObject dynamicRealmObject = dynamicRealm.createObject("Goup_Table_0", "https://www.baidu.com?uii=" + UUID.randomUUID());
                    dynamicRealmObject.setInt("outletId", 33523);
                    dynamicRealmObject.setInt("groupId", 232324);
                    dynamicRealmObject.setInt("skuId", 23523);
                }
                dynamicRealm.commitTransaction();
                trasitionCheck();
            }
        });

fixedThreadPool.execute(new Runnable() {
            @Override
            public void run() {
                DynamicRealm dynamicRealm = DynamicRealm.getInstance(configuration);
                dynamicRealm.beginTransaction();
                for (int i = 0; i < 2500000; i++) {
                    DynamicRealmObject dynamicRealmObject = dynamicRealm.createObject("Goup_Table_1", "https://www.baidu.com?uii=" + UUID.randomUUID());
                    dynamicRealmObject.setInt("outletId", 33523);
                    dynamicRealmObject.setInt("groupId", 232324);
                    dynamicRealmObject.setInt("skuId", 23523);
                }
                dynamicRealm.commitTransaction();
                trasitionCheck();
            }
        });

Can only one process write to the database at the same time?

i never use realm, but seems you are putting a lot of data within the same transaction??

if yes, you may want to change it as such a big transaction is almost never a good idea. e.g. this link.

It’s hard to know exactly where the bottleneck is as it could be a slow internet connection, other processes happening in the app etc.

However one specific point that I can share for certain is that

Realm can be very efficient when writing large amounts of data by batching together multiple mutations within a single transaction. Transactions can also be performed in the background to avoid blocking the main thread

If you break that loop up into smaller chunks, you should see a dramatic improvement (we did)

Here’s some pseudo-code which is a good general design pattern for big data

start a background task {
     int i = 0; i < 10000; i++
         begin write transaction
              for j = 0; i < 5000; j++
                      do stuff
              continue j loop
         commit write
     continue i loop
}

but the big question

Why do you use 1 transaction to create almost duplicate of the same object?

There is no need of transaction.

It looks like you are trying to establish some benchmark for something you really what to do but you are hiding some much details that we cannot do a real assessment of the issue.

Seems not work for me :sob:

        for (int i = 0; i < 1000; i++) {
            realmDB.beginTransaction();
            for (int j = 0; j < 5000; j++) {
                QRBaseData qrBaseData = realmDB.createObject(QRBaseData.class, "https://www.baidu.com?uii=" + UUID.randomUUID());
                qrBaseData.setGroupId(232324);
                qrBaseData.setSkuId(23523);
                qrBaseData.setOutletId(33523);
            }
            realmDB.commitTransaction();
        }

It takes almost 50 minutes. Did I ignore anything?

It’s a test demo used to test how long to take when insert 500 million datas.
You mean there’s no need of transaction? like this?

for (int i = 0; i < 5000000; i++) {
            QRBaseData qrBaseData = realmDB.createObject(QRBaseData.class, "https://www.baidu.com?uii=" + UUID.randomUUID());
            qrBaseData.setGroupId(232324);
            qrBaseData.setSkuId(23523);
            qrBaseData.setOutletId(33523);
        }

I essentially copy and pasted the code (that uses the inner loop), and added a timer to determine start time and end time.

For clarity, the code writes 5 Million, not 500 Million

That code wrote 180 Mb of data and took 4.318 minutes

I am running it on macOS 16GB Ram with a SSD, 3.6 GHz 8-Core Intel Core i9

Based on those results, I would say the bottleneck lies outside of your code.

1 Like

You do not need transaction to create N unrelated documents. Period.

Yes like you shared.

A bit of clarity may be needed - or I may misunderstand the meaning:

I am not sure of the context of that statement but ALL writes in Realm must be within a transaction.

From the docs

Use realm.createObject() in a transaction to create a persistent instance of a Realm object in a realm.

See

or

Even in Swift, all writes must be within a transaction, and the very nature of the code forces the developer to do it that way

try realm.write {
     //this closure is a write transaction
}

The other advantage of using transaction is that the writes in the transaction either all pass or all fail. That guarantees data integrity so you’ll never have a situation where there’s a partial write of data sent to the server within the transaction.

So the above code does not actually write any data to Realm - it just creates an object over and over.

for (int i = 0; i < 5000000; i++) {
            QRBaseData qrBaseData = realmDB.createObject(QRBaseData.class, "https://www.baidu.com?uii=" + UUID.randomUUID());
            qrBaseData.setGroupId(232324);
            qrBaseData.setSkuId(23523);
            qrBaseData.setOutletId(33523);
        }

Again, my testing with your original code writes all that data in about 4 minutes so the code itself is working as intended. If your write is taking much longer, the issues lies somewhere else in your code.

1 Like

I am the one needed more clarity.

I was thinking and commenting about normal mongodb driver. I was out of my league with Realm.

Thanks for the clarification.

Tanks Jason
It took only 8 minutes on another phone. So, the code is not the bottleneck, the old machine is.

And another question, can i use two or more Realm DB to insert data on different thread? I still need to improve the efficiency of this old machine.

Yes, you can have multiple databases on your device and interact with any of them at any time via the config parameter, just changing the Realm file name, which is the last component of the path.

However, those will be treated as completely separate Realms so things like queries and forward/inverse relationships will not be possible cross realm.

On the other hand, for situations where you have a LOT of static data, or where you’re using denormalization, it would work. Like if you have an inventory system - a ‘master list’ of inventory item names could be stored in one realm, and then that name (a copy) and details about the item could be stored in another.

In this use case I don’t think that’s going to be applicable… or really buy any kind of speed improvement since it sounds like the data is all of the same kind so it would end up needing to be stored in the same Realm.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.