Very slow concurrent operations

Masoud_Naghizade · February 7, 2021, 3:54pm

i have a single replica set MongoDB

i have increased poolSize to 300

creating collections concurrently via 100 threads, reduces performance ridiculously
each creation takes up to 10 seconds.

also doing insertMany takes 5 seconds concurrently even outside of transaction

what am i missing?

steevej · February 7, 2021, 5:28pm

What is the hardware configuration? RAM? CPU? Disks?

What is the system architecture? Where is the server, where is the client running the 100 threads?

Masoud_Naghizade · February 7, 2021, 6:39pm

all of them are in the same machine running locally, with a single replica set and not sharded

RAM = 20GB
CPU = 2400 sandy bridge

i monitor the hardware resources, and it seems it is not due to lack of hardware, at least not for creating collection.

steevej · February 7, 2021, 8:45pm

Disks? Local, NAS, SAN, …?

Number of cores?

Shared machine or not?

So you are creating 100 collections in 100 threads? How do you create the collections? You do insertMany, then how many?

How do you come to the above conclusion? Do you have any measures? CPU usage, IO usage, RAM usage

Masoud_Naghizade · February 7, 2021, 9:44pm

local ssd disk.basically everything is local and the one replica set is for supporting transactions

4 cores 3.4 GHz

it is not shared, not a VM, normal Ubuntu desktop IMac computer

create collection is a single line simple function call in Java driver , no strings attatched

also the insertMany is a simple function call

there are no other instructions in between that might cause the delay

cpu is between 50 to 80 percent and Ram is 10GB.

and i dont have any extra index that might effect insertions.

the weird thing is creating collections.it seems it is a very costly task, am i right?
what is a normal expectation for creating a collection in a system described as above?

actually when i run sequentially all create collection and insertions run less the 0.01 sec.

so definitely the concurrency is the problem

steevej · February 7, 2021, 10:43pm

Quite normal since you have 20GB, WiredTiger engine reserve nearly 50%. See https://docs.mongodb.com/manual/core/wiredtiger/

Any numbers for disk I/O?

Please show your code.

Yes, but if your many is 100000, it is important to know.

There is at least 2 files per collections.

Too soon to conclude that mongod is the problem. It might be your threading code.

You should expect that creating a collection is very fast. Creating too many collection is a design anti-pattern. See https://www.mongodb.com/article/schema-design-anti-pattern-massive-number-collections/

We need numbers for disk I/O usage.

Having the client and the server on the same machine with 100 threads might caused too much context switching on a 4 cores machine.

What do you do with the data directory between your 2 tests?

Masoud_Naghizade · February 8, 2021, 10:06am

the blog and the youtube video that descrives 6 anti-pattern of schema design was a great help.i think i am misusing Mongo.because based on my design, i am creating a new collection for each Transaction and then immediately i drop it.but i have more than 20000 collections active at the same time due to load on the server.so as i understood from the youtube video, the WiredTiger creates 2 files for each collection, one for data and one for each index and by default the Wiredtiger opens all of the files at the startup as much as it hits it’s Cache which is half of the Ram.and as recommended, i have to decrease the collections to under 10000 in a single replica set.so i have to change my design.i monitored the disk io and it was almost 90 percent all the time used by the mongo process.so i think you have helped me to pin point the problem.

let me change my design to include all of them in a single collection and finding them using indexes.i’ll let you know about the results.i’ll try not to step to the other pitfalls explained in the video, like unused indexes or bloated documents or massive arrays.

thank you man, you helped me alot.i’ll post the new stats to help others.

Masoud_Naghizade · February 14, 2021, 3:47pm

sorry for late response.you were absolutely right.that video helped a lot.we should not have more than 10000 collection per replica set.so the moment i changed my schema design to single collection with a wildcard index, it was like a miracle.everything is super sonic now.i had careful abstractions and anti corruption layers over my infrastructure which make it possible for me to switch from mongo to sth else in an hour for and enterprise application with more than 10 bounded contexts and allowed me to change my design from thousands of collection, into a single or two collection.but if you hardcode your schema to your code, you will have a hard time modifying your schema design.thank you anyway, you helped me alot

system · February 19, 2021, 3:47pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.