Parallelizing mongodump to dump large amount of data from multiple collections

Hi everyone,

Let me explain my use case first:

I built a node.js app which spawn a bash script to fetch all the collections from my databases and then it will iterate over those collections and dump those.

But after dumping some databases it’s getting stopped, though same bash script was working a few days back! But not getting any kind of error message. Just getting null exitcode on my node.js application from where I have spawned this script.

As we can’t execute dump for more than one collection for an specific query in a single command and I have a large amount of collection to dump, I have intentionally executed dump command in bash script parallelly to make the process faster.

Mentioning the script below:

#!/bin/bash

# set -e

dumpAllClientDB() {

    All_Client_DB=($(jq -r '.AllDemoDB[]' config.json))
    CONNECTION_STRING=$(jq -r '.CONNECTION_STRING' config.json)
    X_VERSION=$(jq -r '.X_VERSION' config.json)

    QUERY="{\"CompanyId\": \"${ChoosenCompanyId}\"}"

    GET_COLLECTION_COMMAND="var collections = db.getCollectionNames(); \
                       for (var i = 0; i < collections.length; i++) { \
                           print(collections[i]); \
                       }"


    bg_process_ids=()
    declare -i max_process=100 process_count=0;

    for db in "${All_Client_DB[@]}";
    do
        echo "Dumping ${db} database for CompanyId ${ChoosenCompanyId}..."

        DB_COLLECTIONS=$("mongosh" "$CONNECTION_STRING/$db-Client$X_VERSION" --quiet --eval "$GET_COLLECTION_COMMAND" | tr -d "',[]")

        # Start multiple background processes
        for collection in $DB_COLLECTIONS;
        do

            if (( $process_count >= $max_process )); then
                for pid in "${bg_process_ids[@]}"; do
                    if kill -0 "$pid" 2>/dev/null; then
                        wait "$pid"
                        exitcode=$?

                        if [ "$exitcode" -ne 0 ]; then
                            exit 1
                        fi

                    fi
                done
                bg_process_ids=()
                process_count=0;
            fi

            mongodump --uri="$CONNECTION_STRING" --db "$db-Client$X_VERSION" --collection ${collection} --query "$QUERY" --out ${ChoosenCompanyId} &
            bg_process_ids+=($!)
            process_count+=1

        done
    done

    # Wait for all background processes to complete

    for pid in "${bg_process_ids[@]}"; do
        if kill -0 "$pid" 2>/dev/null; then
            wait "$pid"

            exitcode=$?

            if [ "$exitcode" -ne 0 ]; then
                exit 1
            fi
        fi
    done


    echo "mongodump completed successfully."

    curl -sH "Authorization: Bearer ${TOKEN}" "${DUMP_ACK_ENDPOINT}?demoCompanyId=${ChoosenCompanyId}&isSucceed=true"
}

ChoosenCompanyId=$1

echo "Demo Company ID: $ChoosenCompanyId"

mkdir -p ${ChoosenCompanyId}

DUMP_ACK_ENDPOINT=$2

TOKEN=$3

dumpAllClientDB