How to import JSON (1GB) into MongoDB faster?

I want to import this json file in upsert mode in MongoDB.

File: http://bulk.openweathermap.org/sample/daily_16.json.gz

This file is almost 1GB (the compressed version is 90MB as you can see in the hyperlinked file).

Exporting this zip file and importing the 1GB takes >50 minutes.
It has taken 20 minutes to complete 25% of import.
Which is too time consuming, is there any faster way to do this?

.\mongoimport.exe --uri="mongodb://localhost/optmyzr" --collection=openWeatherData --mode=upsert --upsertFields=city.id --file="daily_16.json"

Since the input file is in a non-BSON format, it needs to be converted into BSON while importing, which is where most of the time is spent. Also, mongoimport is a single threaded operation. The simplest way to expedite is to split the file into multiple input chunks and run the import operation parallelly against each input chunk.

Note: The no. of parallel processes should be less than the no. of logical CPU Cores on the machine, as each process will run on a single core.

As answered here, creating an index on the upsert field, really reduced the time to 2 minutes.

Replace existing documents in the database with matching documents from the import file. mongoimport will insert all other documents.

A query will be performed on the specified fields, so that’s why an index should exist for them. Otherwise it will be apparently slow.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.