How to query on a large sharded database of 800 million documents

i have to find data by filtering specific dates range in a sharded database having total of 800 million document.

what is the structure or the way to optimize query to get result in such case

Hi @Priyanka_Saxena

From what I remember from Mongo University you would want to include your shard key in the query so that the Mongos can route the query to only the replica sets which store that portion of the data. However you have mentioned querying based off a date range. Again from what I remember any Monotonically Changing value is not good for a shard key as it will not provide an even distribution of data.
I will leave some links to the documents to help you out as well since my reply does not give an solution to your problem.

Which includes hash and range sharding

2 Likes

To query a sharded collection efficiently the query filter criteria must include the shard key. Without the shard key usage, the query will be scatter-gather operation, i.e., all the shards in the collection will be accessed to find the data. The query will be very slow.

To efficiently get the data, the query need to be a targeted operation, and the filter uses the shard key.

Is the specified date field part of the shard key? If not, is the shard key part of the query filter?

1 Like

I have not used shard key yet and answering your question:-
no date field is not part of shard key and not even used in query filter yet.

I am using below intermediate query

cursordata=event.aggregate([{"$match":{“name”:“values”}},{"$unwind":“detailArray”},{"$project":{“detailArray.date”:1,“detailArray,msg”:1}},{$group:{_id:null,count:{$sum:1}}}])

i am not able to get event count of unwinded documents

The $unwind usage requires you prefix the $ to the field name like this in your aggregation query:
{ "$unwind": “$detailArray” }