I want to use indexes on the secondary collection in mongodb

Aditya_Sharma7 · April 26, 2023, 5:19am

I have two collection named as users and user_data, users is a primary collection and the user_data is a secondary collection. I have perform an aggregate query for retriving millions of data but the query is taking too much time for execution. So, i try to use compound indexes on the fields of both the collections for e’g: audience_id: 1, contact.is_bounced: 1, contact.not_valid: 1 but on the time of exection secondary collection indexes are not use. Is there any way to use indexes on joined collection.

Db query :-

db.users.aggregate([{    
     $match: {    
     audience_id: {$in:      [ObjectId('633eaba80cb7a3b1d910e98b'),ObjectId('63b6a5325bf3f04f18170e84')]
  },    
      status: 'Subscribed'    
 }    
 }, {    
 $group: {
  _id: '$contact_id',
  contact_meta: {
   $first: '$ROOT'
 }
 }
 }, {
 $lookup: {
  from: 'user_data',
  'let': {
   id: '$_id'
  },
  pipeline: [
   {
    $match: {
     $and: [
      {
       $expr: {
        $eq: [
         '$_id',
         '$id'
        ]
      }
      },
    {
       email: {
        $exists: true,
        $ne: ''
      }
      },
    {
      $or: [
        {
         is_bounced: {
          $exists: false
         }
        },
        {
         is_bounced: {
          $ne: true
         }
        }
       ]
     },
      {
       $or: [
        {
        not_valid: {
          $exists: false
         }
        },
        {
        not_valid: {
          $ne: true
         }
        }
       ]
      }
    ]
   }
   }
  ],
  as: 'contact' }
 }, 
{
 $unwind: {
  path: '$contact',
  preserveNullAndEmptyArrays: false
 }
},
{
$addFields: {'contact.audience_id': '$contact_meta.audience_id'
}
}, 
{
 $replaceRoot: {newRoot: '$contact'
}
}])

Aasawari · April 28, 2023, 3:22am

Hi @Aditya_Sharma7 and welcome to MongoDB community forums!!

Depending on the version you are on, index being used by the stages of the aggregation pipeline changes.

For instance, before MongoDB version 3.6, the index is only used by the first stage of the aggregation pipeline.
See the Aggregation Pipeline — MongoDB Manual for further reference.

As of latest MongoDB version 6.0, the following are the limitations on using index with $lookup stages.

Multikey indexes are not used.
Indexes are not used for comparisons where the operand is an array or the operand type is undefined.
Indexes are not used for comparisons with more than one field path operand.

However, to understand your case efficiently, could you help me with

sample dataset which would help me reproduce the issue.
explain output for the query tried.
MongoDB version
Output of db.collection.getIndexes() for both collections

Regards
Aasawari

Aditya_Sharma7 · May 2, 2023, 4:19am

Hey @Aasawari i have added the screenshots of the indexes and output, added the sample dataset in the json format and mentioned the steps i have tried and the problem i have faced. Please check it once.
Indexes of users_data collection :-
Screenshot from 2023-05-02 09-04-55
Indexes of users collection :-

Sample dataset of user_data collection :-
contacts.json (2.3 KB)
Sample dataset of users collection :-
contacts_meta.json (3.0 KB)

Basically i have tried a match on audience_id and status of users collection. After that i use the group stage on the basis of $contact_id of users collection and use $ROOT, than i use let lookup with the collection users_data. As explained in the above query. The problem is the query is not time efficient it is taking to much time so i use indexes on users collection which reduce some time and after that for more time efficient i want to use the indexes on the fields of users_data collection as “is_bounced”,“not_valid” but i am not able to do so the indexes are created but they does not use in the query and there is no difference in the time.
The output i needed with the audience_id [ObjectID(“62199c94eb3129e94bdeacb5”)] is as
:-

Mongodb version - 6.0.5
With Regards.

Aditya Sharma

Aasawari · May 8, 2023, 4:25am

Hi @Aditya_Sharma7

Thank you for sharing the sample data and the necessary information.

There are a few points which I would like to mention after we have triaged the issue.

From the sample data posted and the query posted, the query does not seem to return any data as a response. Can you confirm, if the query is expected to perform in the similar way?
The aggregation query mentioned in he post looks complicated. I noted that you used $exists: false , and $ne: true . Note that in general, indexes and databases don’t perform best when you ask it to find things that doesn’t exist or doesn’t match. For example, instead of $ne: true , would it be possible to use $eq: false ? Can you help me understand to breakdown the query?
Based on the sample data and the expected response posted, I tried to create the aggregation pipeline with a new index created on contact_id.
The aggregation query looks lie:

db.users.aggregate([
    {
      '$lookup': {
        'from': 'users_data', 
        'localField': '_id', 
        'foreignField': 'contact_id', 
        'as': 'audience_id'
      }
    }, {
      '$unwind': {
        'path': '$audience_id'
      }
    }, {
      '$addFields': {
        'audience_id': '$audience_id.audience_id'
      }
    }
  ])

and return the response as:

....
{
    _id: ObjectId("63fc6a7ae47bdd0489381512"),
    first_name: 'Last',
    last_name: 'Test',
    full_name: 'Last Test',
    phone_number: '9816428369',
    email: 'adityasharmauna340@gmail.com',
    created_on: 1677486714,
    date: ISODate("2023-02-27T08:31:54.881Z"),
    is_new: 0,
    note: '',
    is_bounced: false,
    not_valid: false,
    is_verified: true,
    audience_id: ObjectId("62199c94eb3129e94bdeacb5")
  },
....

As mentioned in the SERVER-22622: Improve $lookup explain to indicate query plan on the “from” collection, the indexUsed parameter of the explain output defines the index used in the $lookup stage of the pipeline.

{
      '$lookup': {
        from: 'users_data',
        as: 'audience_id',
        localField: '_id',
        foreignField: 'contact_id',
        unwinding: { preserveNullAndEmptyArrays: false }
      },
      totalDocsExamined: Long("8"),
      totalKeysExamined: Long("8"),
      collectionScans: Long("0"),
      indexesUsed: [ 'contact_id_1' ],
      nReturned: Long("9"),
      executionTimeMillisEstimate: Long("1")
    },

Can you help me by confirming if the above query help in improving the performance of the query for your use case. If not, please share more details on what you are trying to achieve

Let us know if you have any further questions.
Regards
Aasawari

Aditya_Sharma7 · May 9, 2023, 4:13am

The previous query execution time for the query is 237 milliSeconds.
As per the changes according to you the new execution time is 112 milliSeconds.
Note - Total docs in primary collection - 19339, secondary collection - 662.
Hey @Aasawari it effect the query, the execution time is decreased by this query.
Thankyou so much for your efforts ma’am. It means aloat.
Regards
Aditya Sharma

system · May 14, 2023, 4:13am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.