Mongorestore in K8s job fails randomly when building indexes

Hi,
I am using mongorestore in a Kubernetes job. Sometimes, the job fails randomly especially when the db I’m restoring is a big one. Restarting the exact same k8s job sometimes works if the db I’m restoring is small.
It seems to be a shared resource issue but I am not sure what could it be. I tried to increase the resources for the job but it didn’t work it’s still failing with same error.

It fails with following error:

2022-06-12T15:01:01.937+0000    building indexes up to 4 collections in parallel                                                                                                                                                                                                                   
2022-06-12T15:01:01.937+0000    starting index build routine with id=3                                                                                                                                                                                                                             
2022-06-12T15:01:01.937+0000    starting index build routine with id=0                                                                                                                                                                                                                             
2022-06-12T15:01:01.937+0000    no indexes to restore for collection                                                                                                                                                                                            
2022-06-12T15:01:01.937+0000    restoring indexes for collection from metadata                                                                                                                                                                                              
2022-06-12T15:01:01.937+0000    index: &idx.IndexDocument{Options:primitive.M{"background":true, "name":"Status_EntitySubType", "v":2}, Key:primitive.D{primitive.E{Key:"Status", Value:1}, primitive.E{Key:"EntitySubType", Value:1}}, PartialFilterExpression:primitive.D(nil)}                 
2022-06-12T15:01:01.937+0000        run create Index command for indexes: Status_EntitySubType                                                                                                                                                                                                     
2022-06-12T15:01:01.937+0000    starting index build routine with id=1                                                                                                                                                                                                                             
2022-06-12T15:01:01.937+0000    restoring indexes for collection  from metadata                                                                                                                                                                                 
2022-06-12T15:01:01.937+0000    index: &idx.IndexDocument{Options:primitive.M{"background":true, "name":"date", "v":2}, Key:primitive.D{primitive.E{Key:"SentAt", Value:-1}}, PartialFilterExpression:primitive.D(nil)}                                                                            
2022-06-12T15:01:01.937+0000    index: &idx.IndexDocument{Options:primitive.M{"background":true, "name":"EntityId_NotificationId", "v":2}, Key:primitive.D{primitive.E{Key:"EntityId", Value:1}, primitive.E{Key:"NotificationId", Value:1}}, PartialFilterExpression:primitive.D(nil)}    
2022-06-12T15:01:01.937+0000        run create Index command for indexes: date, EntityId_NotificationId                                                                                                                                                                                            
2022-06-12T15:01:01.937+0000    restoring indexes for collection notifications.EmailLogs from metadata                                                                                                                                                                                             
2022-06-12T15:01:01.937+0000    index: &idx.IndexDocument{Options:primitive.M{"background":true, "name":"Subject", "v":2}, Key:primitive.D{primitive.E{Key:"Subject", Value:-1}}, PartialFilterExpression:primitive.D(nil)}                                                                        
2022-06-12T15:01:01.937+0000    index: &idx.IndexDocument{Options:primitive.M{"background":true, "name":"SentAt", "v":2}, Key:primitive.D{primitive.E{Key:"SentAt", Value:-1}}, PartialFilterExpression:primitive.D(nil)}                                                                          
2022-06-12T15:01:01.937+0000        run create Index command for indexes: Subject, SentAt                                                                                                                                                                                                          
2022-06-12T15:01:01.937+0000    starting index build routine with id=2                                                                                                                                                                                                                             
2022-06-12T15:01:01.937+0000    no indexes to restore for collection notifications.EmailTemplates                                                                                                                                                                                                  
2022-06-12T15:01:01.937+0000    restoring indexes for collection notifications.SMSLogs from metadata                                                                                                                                                                                               
2022-06-12T15:01:01.937+0000    index: &idx.IndexDocument{Options:primitive.M{"background":true, "name":"Phone_NotificationId", "v":2}, Key:primitive.D{primitive.E{Key:"Phone", Value:1}, primitive.E{Key:"NotificationId", Value:1}}, PartialFilterExpression:primitive.D(nil)} 
2022-06-12T15:01:01.937+0000        run create Index command for indexes: Phone_NotificationId                                                                                                                                                                                                     
2022-06-12T15:01:01.944+0000    Failed: notifications.SMSLogs: error creating indexes for notifications.SMS: createIndex error: connection() error occured during connection handshake: auth error: unable to authenticate using mechanism "SCRAM-SHA-256": (KeyNotFound) Cache Reader No keys 
 found for HMAC that is valid for time: { ts: Timestamp(1655046061, 8295) } with id: 0

Hi @Reab_AB and welcome to the community !!

It would be really helpful if you could share a few details for the above mentioned issue:

  1. The way you are trying to restore the collection. Are you trying to restart the pod and tried to restore into a new pod in the same namespace or are you trying to restore to n entirely new deployment?
  2. The persistent volume yaml dedicated to the cluster and the size of the database you are trying to restore?
  3. The deployment type for the cluster (standalone, replica set or a sharded cluster)
  4. The MongoDB version for the deployment.

Thanks
Aasawari

1 Like

Hi @Aasawari

  • I’m using k8s jobs to restore the DB, the job is in the same namespace as the deployment.
  • PVC is 1000GB and data size to be restored 200. I don’t think It’s a pvc size issue.
  • The backup is from standalone and restored into a sharded cluster
  • MongoDB version is 4.4

Forgot to mention, I am using MongoDB Database Tools in the k8s job. Setting requested memory 8 Gi and memory limits 24Gi

Hi @Reab_AB and thank you for sharing the above information.

I don’t believe I have enough information to reproduce the issue that you have been seeing. Could you provide more information regarding:

  1. For every large collection as mentioned, is the error observed same as mentioned( Index error and auth error)
  2. How is the k8 jobs being set up
  3. The configurations files being used.
  4. Size of the mongodump directory on failure and success.
  5. Are you able to access the mongodump directory outside the standalone k8 pod?
  6. Are you able to access mongodump directory where the dump was created from the k8 pod from where you are trying to restore?

Please help us with the above information to help you further.

Thanks
Aasawari

1 Like