I have noticed that the scalability of the MongoDB java driver for bulk writing using multiple threads is not quite linear. I have attached a sample code at the bottom for the sake of this experiment.
- Description of the experiment
Inserting 60 million documents to the same collection using 1 thread compared to using 2 and 4 threads.
The document being inserted looks like the following:
{
"_id" : ObjectId("5ed58b1ba7f1ea6927b148b5"),
"key17" : "paorgpaomrgpoapmgmmpagm",
"key12" : "2020-03-16",
"key7" : "0.094",
"key6" : "44923.59",
"key4" : "9",
"key10" : "r",
"key1" : "7",
"key2" : "8395829",
"key5" : "28",
"key13" : "2020-03-16",
"key9" : "e",
"key11" : "2020-03-16",
"key14" : "klajdlfaijdliffna",
"key15" : "933490",
"key3" : "928749",
"key8" : "0.29"
}
- Results of the experiment
– One thread
iteration total time (in miliseconds) total time (mm:ss)
1 706029 11:46
2 692437 11:32
3 689602 11:29
Average total time: ~696022.67 miliseconds
– Two threads
iteration total time (in miliseconds) total time (mm:ss)
1 386070 06:26
2 378080 06:18
3 379040 06:19
Average total time: ~381063.33 miliseconds
– Four threads
iteration total time (in miliseconds) total time (mm:ss)
1 251073 04:11
2 265991 04:25
3 257938 04:17
Average total time: 258334 milisecond
Here are the specs of the client and server.
-Server Specs
Processors: 2 x Intel Xeon E5-2640 2.50GHz
Memory: 8GB RDIMM, 1333 MH (Total 32Gb RAM)
Network Card Speed: Broadcom 5720 QP 1Gb Network Daughter Card
Operating System: Core OS
MongoDB Server Version: 3.6.2 (Docker hosted)
-Client Specs
Processors: Intel Core i7-4790 CPU @ 3.60GHz (8CPUs). ~3.1GHz
Memory: 16GB RAM
Network Card: Intel(R) Ethernet Connection (2) I218-V, 1Gb
Operating System: Windows Server 2012 R2 Standard
The MongoDB Java driver version used from the client is 3.12.0.
As you can see from the results, from one thread to 2 threads the performance gain is around 1.8x. However, from 2 threads to 4 threads, the performance gain has dropped to 1.4x.
I was wondering why does the scalability drop when more threads are added ? is this the expected behavior of the MongoDB java driver ? Is it correct to assume that the driver should be twice as fast to insert the 60 million documents when using 4 threads compared to when using 2 threads ?
P.S: The network throughput doesn’t seem to be an issue here as I have verified that I can scp a large file from the client to the server with an average data transfer rate of 200 MB/s.
– Sample code
public class MongoDBWriterTwoThreads {
public static void main(String[] args) {
MongoDBWriterTwoThreads_1 writer1 = new MongoDBWriterTwoThreads_1();
MongoDBWriterTwoThreads_2 writer2 = new MongoDBWriterTwoThreads_2();
Thread t1 = new Thread(writer1);
Thread t2 = new Thread(writer2);
t1.start();
t2.start();
}
}
class MongoDBWriterTwoThreads_1 implements Runnable {
private static MongoClient mongoClient;
private static String databaseName;
private static String collectionName;
private static WriteConcern wc;
private static String connectionString;
private static int documents;
static {
try {
wc = new WriteConcern(0).withJournal(false);
databaseName = "test";
collectionName = "testColl";
connectionString = "mongodb://1.2.3.4:34567";
documents = 30000000; // 30 million.
mongoClient = new MongoClient(new MongoClientURI(connectionString));
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
@Override
public void run() {
synchronized (this) {
System.out.println(Thread.currentThread().getName() + " START " + new Date(System.currentTimeMillis()));
System.out.println("Database: " + databaseName);
System.out.println("Collection: " + collectionName);
System.out.println("Write concern: " + wc);
MongoDatabase database = mongoClient.getDatabase(databaseName);
MongoCollection<Document> collection = database.getCollection(collectionName).withWriteConcern(wc);
List<InsertOneModel<Document>> docs = new ArrayList<>();
int batchSize = 1000;
int batch = 0;
long start = System.currentTimeMillis();
for (int i = 0; i < documents; ++i) {
String key1 = "7";
String key2 = "8395829";
String key3 = "928749";
String key4 = "9";
String key5 = "28";
String key6 = "44923.59";
String key7 = "0.094";
String key8 = "0.29";
String key9 = "e";
String key10 = "r";
String key11 = "2020-03-16";
String key12 = "2020-03-16";
String key13 = "2020-03-16";
String key14 = "klajdlfaijdliffna";
String key15 = "933490";
String key17 = "paorgpaomrgpoapmgmmpagm";
Document doc = new Document("key17", key17).append("key12", key12)
.append("key7", key7).append("key6", key6)
.append("key4", key4).append("key10", key10)
.append("key1", key1).append("key2", key2)
.append("key5", key5).append("key13", key13)
.append("key9", key9).append("key11", key11)
.append("key14", key14).append("key15", key15)
.append("key3", key3).append("key8", key8);
docs.add(new InsertOneModel<>(doc));
batch++;
if (batch >= batchSize) {
collection.bulkWrite(docs);
docs.clear();
batch = 0;
}
}
if (batch > 0) {
collection.bulkWrite(docs);
docs.clear();
}
mongoClient.close();
double elapsed = (System.currentTimeMillis() - start) / 1000.0;
System.out.println(Thread.currentThread().getName() + " took " + elapsed + " seconds.");
System.out.println(Thread.currentThread().getName() + " END " + new Date(System.currentTimeMillis()));
}
}
}
class MongoDBWriterTwoThreads_2 implements Runnable {
private static MongoClient mongoClient;
private static String databaseName;
private static String collectionName;
private static WriteConcern wc;
private static String connectionString;
private static int documents;
static {
try {
wc = new WriteConcern(0).withJournal(false);
databaseName = "test";
collectionName = "testColl";
connectionString = "mongodb://1.2.3.4:34567";
documents = 30000000; // 30 million.
mongoClient = new MongoClient(new MongoClientURI(connectionString));
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
@Override
public void run() {
synchronized (this) {
System.out.println(Thread.currentThread().getName() + " START " + new Date(System.currentTimeMillis()));
System.out.println("Database: " + databaseName);
System.out.println("Collection: " + collectionName);
System.out.println("Write concern: " + wc);
MongoDatabase database = mongoClient.getDatabase(databaseName);
MongoCollection<Document> collection = database.getCollection(collectionName).withWriteConcern(wc);
List<InsertOneModel<Document>> docs = new ArrayList<>();
int batchSize = 1000;
int batch = 0;
long start = System.currentTimeMillis();
for (int i = 0; i < documents; ++i) {
String key1 = "7";
String key2 = "8395829";
String key3 = "928749";
String key4 = "9";
String key5 = "28";
String key6 = "44923.59";
String key7 = "0.094";
String key8 = "0.29";
String key9 = "e";
String key10 = "r";
String key11 = "2020-03-16";
String key12 = "2020-03-16";
String key13 = "2020-03-16";
String key14 = "klajdlfaijdliffna";
String key15 = "933490";
String key17 = "paorgpaomrgpoapmgmmpagm";
Document doc = new Document("key17", key17).append("key12", key12)
.append("key7", key7).append("key6", key6)
.append("key4", key4).append("key10", key10)
.append("key1", key1).append("key2", key2)
.append("key5", key5).append("key13", key13)
.append("key9", key9).append("key11", key11)
.append("key14", key14).append("key15", key15)
.append("key3", key3).append("key8", key8);
docs.add(new InsertOneModel<>(doc));
batch++;
if (batch >= batchSize) {
collection.bulkWrite(docs);
docs.clear();
batch = 0;
}
}
if (batch > 0) {
collection.bulkWrite(docs);
docs.clear();
}
mongoClient.close();
double elapsed = (System.currentTimeMillis() - start) / 1000.0;
System.out.println(Thread.currentThread().getName() + " took " + elapsed + " seconds.");
System.out.println(Thread.currentThread().getName() + " END " + new Date(System.currentTimeMillis()));
}
}
}