Process mongodump gzip archive in python

I am using python to export some collections from the database, by running mongodump command.
An example command is something like this

run(f'{to_debug()}mongodump --uri="{uri}" -c "projects" --query=\'{project_query}\' --gzip '
            f'--archive="{file_parts[0]}.projects{file_parts[1]}"')

As you can see, the result is beeing archived and gzipped.

The export works ok, and if I want to import afterwards the resulted archives, everything it’s ok.

The problem is that now I need to process a little bit the data from within the archive, so I would like to access the archive contents inside python and take some information from there.

Until now, I tried tarfile module from python but it doesn’t work, basically I cannot even open the archive - the errors that I am receivieng are invalid header or not a gzip file …
Does anyone from here has some experience or encountered this use case ? Any suggestions will be welcomed :slight_smile:

Thank you

1 Like