Optimizing for lookup operations

We are currently at the finish phaze of our application development which is the generation of certain reports. In order for us to create those reports we must perform certain joins between collections that may contain millions of documents. We are currently testing with document numbers that range between 100.000 to 300.000 and the database operations perform pretty purely (it takes more than 20sec to yield results). We are aware that this is because the index we have created are not utilized during the lookup operations (COLLSCAN is used).

The structure of the collections that we are using is the following.

1. ServiceEvent document

{
   _id : ObjectId,
   DateStart : DateTime,
   DateEnd : DateTime,
   Service : {
      _id : ObjectId,
      Source : ObjectId, 
      Name : string,
      Color : string
   },
   CheckIns : [
      {
         _id : ObjectId,
         CheckInDate : DateTime,
         CheckOutDate : DateTime
         Customer : {
            _Id : ObjectId,
            Source : ObjectId,
            Username : string,
            FirstName : string,
            LastName : string
         },
      }
   ]
}

2. Customer document

{
   _id : ObjectId,
   Username : string,
   FirstName : string,
   LastName : string,
   Age : int,
   DateCreated : DateTime
}

3. Service document

{
   _id : ObjectId,
   Name : string,
   Color : string,
   DateCreated : DateTime,
}

For simplicity’s sake most of the properties of the service event, the customer and the service entities were emmited.

The operation we need to perform is first an aggregation between the service events and the embedded check ins collection and then join the customers and the services on the aggregated collection.

We are using the MongoDb driver for C# and the code is the following.

from serviceEventCheckIn in ServiceEvents.AsQueryable()
   .SelectMany(x => x.CheckIns, (serviceEvent, checkIn) => new AggregatedServiceEventCheckInEntity()
   {
      ServiceEventId = serviceEvent.Id,
      ServiceId = serviceEvent.Service.Source,
      CustomerId = checkIn.Customer.Source,

      ServiceEventCheckInDate = checkIn.CheckInDate,
      ServiceEventCheckOutDate = checkIn.CheckOutDate,
   })
join service in Services.AsQueryable() on serviceEventCheckIn.ServiceId equals service.Id into services
join customer in Customers.AsQueryable() on serviceEventCheckIn.CustomerId equals customer.Id into customers
select new AggregatedServiceEventCheckInOfCustomerOfServiceEntity()
{
   ServiceEventId = serviceEventCheckIn.ServiceEventId,
   CustomerId = serviceEventCheckIn.CustomerId,
   ServiceId = serviceEventCheckIn.ServiceId,

   ServiceEventCheckInDate = serviceEventCheckIn.ServiceEventCheckInDate ,
   ServiceEventCheckOutDate = serviceEventCheckIn.ServiceEventCheckOutDate,
                          
   CustomerAge = customers.First().Age,
   CustomerDateCreated = customers.First().DateCreated,
   ServiceDateCreated = services.First().DateCreated
}

The Source property on the service embedded object of the service event entity acts as a “foreign key” pointing to the id property of the service entity.

The Source property on the customer embedded object of the check ins embedded object collection acts as a “foreign key” pointing to the id property of the customer entity.

We have created ascending indexes for those properties on the ServiceEvents collection as and the collection now has the following indexes as they can be seen on the MongoDb compass application.

It is clear that the look up operation is executed using fields that have been indexed, yet the execution strategy that is selected is COLSCAN resulting in a 20+sec operation execution time! Are we doing something wrong, or is this expected for MongoDb since its not designed as a relational database. Are we doing something wrong when defining the indexes. Any help is greatly appreciated!