I have a VM that host in Azure which Standard D2ds v5 (2 CPU core, 8 Gb of RAM)
I create the console application that read data from CSV file and insert into Collection.
The CSV file is around 100k rows with 30 columns (14MB).
The code that use to insert data shown below
public static void ConnectX(string path, string myQueueItem)
{
string connectionString = "mongodb://localhost:27017/";
MongoClient client = new MongoClient(connectionString);
IMongoDatabase database = client.GetDatabase($"csa");
// connect to collection
if (!CollectionExists(database, myQueueItem))
{
var options = new CreateCollectionOptions()
{
};
database.CreateCollection(myQueueItem, options);
}
IMongoCollection<BsonDocument> collection = database.GetCollection<BsonDocument>(myQueueItem);
// stopwatch to measure time
Stopwatch stopwatch = new Stopwatch();
Console.WriteLine("Inserting data...");
stopwatch.Start();
using (var reader = new StreamReader(path))
using (var csvReader = new CsvReader(reader, new CsvConfiguration(CultureInfo.InvariantCulture) { Delimiter = ";" }))
{
csvReader.Read();
csvReader.ReadHeader();
int offset = 0;
var data = new List<BsonDocument>();
var option = new InsertManyOptions() { IsOrdered = false };
while (csvReader.Read())
{
var document = new BsonDocument();
foreach (var header in csvReader.HeaderRecord)
{
document.Add(header, csvReader.GetField(header));
}
if (offset > 20000)
{
collection.InsertMany(data, option);
offset = 0;
data = new List<BsonDocument>();
}
else
{
data.Add(document);
offset++;
}
}
if (offset > 0)
{
collection.InsertMany(data, option);
}
}
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds + " ms");
}
The method will be triggered by queue message.
I drop 100 messages at a time and run the application(1) to import data
It takes 3000 ms. to insert data to MongoDB one by one
Then I run the application(2) to import data
After run 2 applications to import data. It takes 5000 ms. to insert data to MongoDB each time.
the collection is around 1000 collections with same data
the information about mongoDB is
MongoDB 6.0.6 community. Deploy in single machine without shard
{
name: 'wiredTiger',
supportsCommittedReads: true,
oldestRequiredTimestampForCrashRecovery: Timestamp({ t: 0, i: 0 }),
supportsPendingDrops: true,
dropPendingIdents: Long("0"),
supportsSnapshotReadConcern: true,
readOnly: false,
persistent: true,
backupCursorOpen: false
}
My question is How to reduce insert time if the import run parallel? It looks like the Application(1) and Application(2) wait each other to insert, only one insert command can run at a time, and time double in my case.