Create compound index in an extremely large collection running on standalone mongodb

Buxuan_Li · June 6, 2023, 3:27am

Hi all,

I am new to mongo db and I created a large collection of ~ 1 TB (since I read the guide and it says there is no limit on # of doc in a collection but suggests a finite number of collections…). After a month of data acquisition, I started to work on it and realized a huge problem, querying data is extremely slow (basically takes forever). I am hosting the data on a dockerized mongo 4.4 running on my NAS with 4 core CPU, and dealing with them with a mongo 6.0 running on my mac m2.

Now I have the following questions:

I didnt create index when inserted these documents. Now I tried to create index for the populated collection from mongosh command line on mac, but it complains that “standalones can’t specify commitquorum”. I didnt find any repica or commitquorum settings in my mongo.conf. I dont know how to turn if off?
Another way around is that I tried in mongo compass it seems to be able to run create index. But it is extremely slow. I have a streaming data writing into the collection every 4 hours. I am not sure if this is the reason but after a long time the create index eventually failed in mongo compass.
I am writing new data to the collection. I first cumulate them on my mac, and then dump and restore on NAS. I want to create index before dumping and restoring in the future. Can I restore the indexed collection into a populated un-indexed collection?
what is the proper way to create index on this large collection running in a standalone instance?
If the above question is hard, is there any way to split this large collection by some of its categorical field
into multiple smaller collection efficiently?

Many thanks for any comments/suggestions!

Kobe_W · June 6, 2023, 3:54am

what command did you use to create the index?
mongodb - why mongorestore restore indexes? - Stack Overflow
db.collection.createIndex. no other magic
shard the collection first. 1TB can be a candidate for sharding

Buxuan_Li · June 6, 2023, 4:07am

thanks for the comments!
the command i used was:
/opt/homebrew/bin/mongosh --host $REMOTE_HOST:$PORT -u $USER_NAME -p $PASSWORD --authenticationDatabase “admin” --db $REMOTE_DB --eval “db.$REMOTE_COLLECTION.createIndex({timestamp:1,index:‘text’},{ unique: true },{name:‘timestampSymbol’})” >> $LOG_FILE 2>&1

I noticed sharding. I am trying to understand how it works and how to deploy.