Heavy insert_one, update_one performance issue(s)

Richard_Nelson · June 21, 2022, 9:54pm

Hello!

This is my first MongoDB project, and I’m using c-driver 1.21.1 to a local mongod at 5.0.9-1
On a RHEL 8.6 x86_64 VM … running in 32bit mode (for reasons)

We have two applications

Generate and populate 6 collections
Posting… Using those 6 collections, create and populate 7 more … the usual CRUD stuff.

The collections are one of two flavours:

1 indexed key, which is not in the data portion of the document (not indexed)
For these, query projection includes the key and data
1 - n indexed keys, which are in the data portion of the document (also not indexed)
For these, the query projection only includes the data (I can reconstitute a key if needed)

In both cases, all query and options bson_t are stack based (and rarely outgrow that).
The insert and update bson_t are dynamic, pre-allocated via bson_writer_new() with an estimated document size (at least big enough, may have some unused space) … This is to remove the realloc()/memcpy()/free() overhead associated with growing documents.

I have checked the mongodb.log, and there are no ‘Slow queries’, other than index creation
I am using a socket connection to a local mongod - on an untuned, but unloaded Virtual x86 Machine
that has 8 CPU, 32GB memory, and unknown backing store (SSD or spinning). Journalling is off because the database is dropped and recreated for every run.

This logic is working fine for our internal btree (C-ISAM knockoff) filesystem, with very quick turn-around.
The times for the two steps, with 100000 starting records are:

real    0m36.484s
user    0m25.404s
sys     0m3.772s

real    0m53.387s
user    0m8.791s
sys     0m11.458s

Now I know there is a huge difference, and I’ll not get anywhere near close to those numbers…
Our Proof of Concept guys figured we’d have a second step on the order of 5 minutes, and we gain a lot of bang for that time delta (replication, concurrent updates, …)

However, what we’re actually seeing for the same 100000 input records is:

real	4m37.602s
user	1m12.026s
sys	0m26.249s

real	28m34.943s
user	5m36.740s
sys	3m0.743s

I have two SWAGs at the likely slowdown culprits:

13 concurrently accessed collections, each with 100,000s of documents
collections are writen one by one and then a new input document is read.
The sheer size of many of the documents being inserted/updated:

Document size     119
Document size     574
Document size     596
Document size   2,078
Document size   4,074
Document size   4,575
Document size   6,078
Document size   8,574
Document size   9,633
Document size  16,074
Document size  20,074
Document size  22,076
Document size  22,078

The default (Linux) socket buffer size is 212,992, but I don’t know what c-driver might be expecting.

In watching the thing run, mongod utilizes about 70 some percent, and my program about 36 percent.
My program, and one of the mongod collection threads are taking turns running/sleeping.
The 8cpu machine very rarely gets above 1.30 utilization

The first phase runs with ~610 minor page faults. The second is way worse ~8,724,377 yowza

The higher level application expect synchronous I/O … but I’m trying to convince them to let me play with bulk operations … though I can’t chain too many at once - with those sizes !

I would appreciate any, and all questions, complaints about what/how I’m doing this, or maybe a suggestions on where to look next !

Thank you
Richard Nelson
cowboy@us.ibm.com (currently stuck in the bowels of many file systems, at once)