I have one query on database My collection have 210 million docs And there is one query I am execute like {a : id, $or : [{b:id,c:stringId},{b: $in_large_array,c:$in_large_arrray,d : $in_large_array}]} Above is my query and when I apply sort it doesnt wor

I have one query on database My collection have 210 million docs And there is one query I am execute like {a : id, $or : [{b:id,c:stringId},{b: $in_large_array,c:$in_large_arrray,d : $in_large_array}]} Above is my query and when I apply sort it doesnt wor

where is your query?

I have a problem too. My car does not work.

Seriously, do you think anyone can help me fix my car when the only piece of information I give is My car does not work.

Seriously, we need to know what you mean by it does not work.

Do you have error messages?

If no error messages, do you mean the sort order is wrong? If the sort is wrong it would be good to see what sort you are applying. It would also be good to see sample documents that are in the wrong order.

2 Likes

db.form_response.aggregate([
{
$match: {
PartnerId: “585938e3d4e9dac9bb2b09c6”,
$or: [
{
BusinessID: 287892,
FormID: “64d9f5732061202e2576df19”
},
{
mlGroupIds: { $in: [NumberLong(3071),
NumberLong(3095),
NumberLong(3096) ]},
BusinessID: { $in: [NumberLong(285325),NumberLong(287890),NumberLong(287891)]},
FormID: { $in: [“64d9f5732061202e2576df19”,“64d9f8b8ec1f6cc819583949”,“64d9fbc5ec1f6cc81958398a”]}
}
]
IsDeleted: false
},
},
{
$sort : {RespDate : -1}
}
],{allowDiskUse : true,hint : “PartnerId_1_IsDeleted_1_RespDate-1_BusinessID_1_FormID_1_mlGroupIds_1”})

{a : id, $or : [{b:id,c:stringId},{b: $in_large_array,c:$in_large_arrray,d : $in_large_array}]}

I mean could you please suggest me a proper index
After filter my result set is around 766K docs and I am use m50 server

Using the principles outlines by the ESR rules I would start to experiment an index on

PartnerId because of the exact match
RespDate because you sort

I would handle the IsDeleted by using a partial index.

Please follow up on your thread.

Thanks

aggregate ([
{ “$match” : { “$or” : [
{ “BusinessID” : NumberLong(315602), “FormID” : “6650a30c831fb9dfd5080e22” },
{ “mlGroupIds” : { “$in” : [NumberLong(3071), NumberLong(3095), NumberLong(3096), NumberLong(3128), NumberLong(3165), NumberLong(3166), NumberLong(3167), NumberLong(3168), NumberLong(3169), NumberLong(3170), NumberLong(3171), NumberLong(3172), NumberLong(3173), NumberLong(3194), NumberLong(3244), NumberLong(3245), NumberLong(3246), NumberLong(3248), NumberLong(3249), NumberLong(3250), NumberLong(3251)] },
“BusinessID” : { “$in” : [NumberLong(285325), NumberLong(287890), NumberLong(287891), NumberLong(287892),…150]}
“FormID” : { “$in” : [“64d9f4d82061202e2576df10”, “64d9f844ec1f6cc819583944”, “64d9fb4d2061202e2576df4a”, “64d9fdebec1f6cc81958399a”, “64da01272061202e2576dfc5”, “64da050aec1f6cc8195839d1”, “64da09407f50e451862654b8”, “64da0b382061202e2576e05c”, “64da0d37ec1f6cc819583a4c”,…,300]}]}},
{
$unwind: {
path: “$Responses”
}
},
// Stage 3: $match for type != graph types
{
$match: {
“Responses.Type”: { $nin: [“5”,“3”,“4”,“6”,“12”] }
}
},
// Stage 4: $sort by _id in descending order
{
$sort: {
_id: -1
}
},
// Stage 5: $group by QID and Type
{
$group: {
_id: { QID: “$Responses.QID” },
TotalCount: { $sum: 1 },
Type: { $first: “$Responses.Type” },
Order: { $first: “$Responses.Order” },
AnsType: { $first: “$Responses.AnsType” },
Question: { $first: “$Responses.Question” },
UserResponses: {
$push: {
BusinessID: “$BusinessID”,
RespBy: “$RespBy”,
RespByName: “$RespByName”,
Type: “$Responses.Type”,
Options: “$Responses.Options”,
Answers: “$Responses.Answers”
}
}
}
},
// Stage 6: $replaceRoot to create a new root object
{
$replaceRoot: {
newRoot: {
QID: “$_id.QID”,
Order: “$Order”,
AnsType: “$AnsType”,
Question: “$Question”,
Options: “$Options”,
Type: “$Type”,
TotalCount: “$TotalCount”,
UserResponses: {
$slice: [
“$UserResponses”,
(0 - 0) * 10,
10
]
}
}
}
}
], { allowDiskUse: true });])

This query return 9 lakhs documents from 21 crore collection
Below is my document in my collection for reference
{
“_id” : ObjectId(“5b7e6d6b2a1f4918e86f9bf0”),
“PartnerId” : “585938e3d4e9dac9bb2b09c6”,
“BusinessID” : NumberLong(36711),
“FormID” : “5b7e6d552a1f4918e86f9bee”,
“RespDate” : ISODate(“2018-08-23T08:16:43.893Z”),
“RespBy” : NumberLong(0),
“RespByName” : null,
“RespBySearch” : null,
“CreatedBy” : NumberLong(0),
“CreatedByName” : null,
“Responses” : [
{
“QID” : 5,
“Order” : 1,
“Question” : “Please sign here”,
“Type” : “12”,
“SignImageURL” : “https://ee7c1b57be1254e85a9b-46bb9485f7496af88d0ba4c0a67d29cd.ssl.cf2.rackcdn.com/Forms/Signature/Original/$2018_08_23_01_16_43_8948.png”,
“IsRequired” : true,
“IsDeleted” : false,
“Options” : [
{
“Order” : 0,
“Option” : “Signed”
},
{
“Order” : 1,
“Option” : “Unsigned”
}
],
“Answers” : [
“Signed”
]
}
],
“HtmlUrl” : “https://ee7c1b57be1254e85a9b-46bb9485f7496af88d0ba4c0a67d29cd.ssl.cf2.rackcdn.com/FormResponses/Resp_36711_5b7e6d552a1f4918e86f9bee__201808230116436168.html”,
“IPAddress” : “182.156.82.94”,
“Device” : “Website”,
“UserAgent” : “Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0”,
“ModifiedDate” : null,
“ModifiedBy” : null,
“ModifiedByName” : “”,
“IsDeleted” : false,
“Location” : {
“Street” : null,
“StreetNo” : null,
“StateID” : null,
“StateName” : “KA”,
“Country” : “IN”,
“City” : “Bengaluru”,
“Zip” : null
},
“FormType” : 1,
“IsProcess” : 1
}

@steevej
I think mongodb cant handle this kind of situation
I already try many index as ESR rule but it is beyond the mongodb
I already upgrade the cluster to m80 but not efficient my cpu utilization is high
So I think I need to go with elastic search for that
Is it a good Bro ?

You do a lot of things that hamper performance.

  1. $sort after $unwind
  2. storing object id (FormID) as string rather than object id
  3. $in with large arrays
  4. using $nin
  5. having a use case that returns 9 lakhs documents
  6. $group-ing all the UserResponses and then $slice

For example for $in, if you are querying a list of BusinessID, then they must have some kind of relation between them. Are they from the same country, same region? The you should have a field that represent that grouping and query using the grouping attribute.

Same thing with FormID:$in, why so many form ids, what do they have in common. May be you should have a FormType in addition to a FormID. This way you query on 1 or 2 FormType rather than a big list of FormId.

Ditto for the $nin of Response.Type, your comment type != graph types, seems to indicate that you could a response.TypeGroup field one of graph type and one for ! graph type.

Yes I create a group of formId as masterId and update documents and also create a group of bussinessId as masterBusinessId after that it works good or not

It is not clear what really happend,

Did it worked good or not?