Mongodump escaping special collection characters

Hello,

If you mongodump a collection with a name including “:” character for example : db._Join:users:_Role
This will write _Join%3Ausers%3A_Role.metadata.json.gz

And then in mongorestore, the db wont be the same as dumped because the collection name is restore with % character.

Thanks

mongodump --version
mongodump version: 100.1.1
git version: 8bca136c2a0e0daa3947df31c0624c5615f9aa02
Go version: go1.12.17
os: linux
arch: amd64
compiler: gc

Hi @Christopher_Brookes and welcome in the MongoDB Community :muscle: !

I can’t reproduce your issue.
Here is what I did to try to reproduce your issue:

  • Start a fresh MongoDB in docker:
docker run --rm -d -p 27017:27017 -h $(hostname) --name mongo mongo:4.4.0 --replSet=test && sleep 4 && docker exec mongo mongo --eval "rs.initiate();"
  • I then created 2 collections in the test db with a few documents:
    • col
    • col:test
test:PRIMARY> show collections 
col
col:test

Note that I had to use getCollection() to insert in this weird collection because the normal db.col:test.insert() didn’t work here.

db.getCollection("col:test").insert({name:"Max"})

Here is the result of mongodump:

polux@hafx:/tmp/mdb$ mongodump 
2020-09-16T20:30:03.862+0200	writing admin.system.version to dump/admin/system.version.bson
2020-09-16T20:30:03.863+0200	done dumping admin.system.version (1 document)
2020-09-16T20:30:03.863+0200	writing test.col to dump/test/col.bson
2020-09-16T20:30:03.864+0200	done dumping test.col (3 documents)
2020-09-16T20:30:03.864+0200	writing test.col:test to dump/test/col%3Atest.bson
2020-09-16T20:30:03.865+0200	done dumping test.col:test (1 document)
polux@hafx:/tmp/mdb$ tree
.
└── dump
    ├── admin
    │   ├── system.version.bson
    │   └── system.version.metadata.json
    └── test
        ├── col%3Atest.bson
        ├── col%3Atest.metadata.json
        ├── col.bson
        └── col.metadata.json

3 directories, 6 files

Indeed, we can notice the %3A in the file names which is just the representation of : as you can see here.

  • Then I dropped the test db in MongoDB.
  • And I reimported it with mongorestore
polux@hafx:/tmp/mdb$ mongorestore 
2020-09-16T20:35:36.908+0200	using default 'dump' directory
2020-09-16T20:35:36.909+0200	preparing collections to restore from
2020-09-16T20:35:36.909+0200	reading metadata for test.col from dump/test/col.metadata.json
2020-09-16T20:35:36.909+0200	reading metadata for test.col:test from dump/test/col%3Atest.metadata.json
2020-09-16T20:35:36.954+0200	restoring test.col:test from dump/test/col%3Atest.bson
2020-09-16T20:35:36.965+0200	restoring test.col from dump/test/col.bson
2020-09-16T20:35:36.967+0200	no indexes to restore
2020-09-16T20:35:36.968+0200	finished restoring test.col:test (1 document, 0 failures)
2020-09-16T20:35:36.970+0200	no indexes to restore
2020-09-16T20:35:36.970+0200	finished restoring test.col (3 documents, 0 failures)
2020-09-16T20:35:36.970+0200	4 document(s) restored successfully. 0 document(s) failed to restore.

Here is the result in my DB test:

test:PRIMARY> show collections
col
col:test

Conclusion

As a good practice, I would avoid this kind of weird characters in db and collection names. There are actually naming restrictions in MongoDB’s doc. Looks like it’s working for me but apparently something isn’t going well in your case.

I guess you have some encoding issues in your shell or maybe you used some options for mongodump or mongorestore that made things awkward for some reasons? I would just stay away to avoid avoid surprises of this kind.

$ mongodump --version
mongodump version: 100.1.1
git version: 8bca136c2a0e0daa3947df31c0624c5615f9aa02
Go version: go1.12.17
   os: linux
   arch: amd64
   compiler: gc

Hello Maxime,

Thanks a lot for your very complete response. This helped me to figure it out what happens here.
When dumping the same collection with previous mongo-tools new release, there is no special encoding on the “:” character on collections names.

mongodump --version
mongodump version: r4.2.8
git version: 43d25964249164d76d5e04dd6cf38f6111e21f5f
Go version: go1.12.17
   os: darwin
   arch: amd64
   compiler: gc   


 mongodump --gzip
 ls
     _Join:users:_Role.bson.gz 

The encoding collection name only happens when using the new mongo tool (100.1) version as you seen also on your side.

In my case i was dumping the collection with the new version, 100.1, so with “%3A” in final gzip files name and i was restoring on a different machine with mongo-tools previous version. The previous version looks like it does not decode the collection name and restore as it is in the dump folder. I down graded the mongo tool version on the dumping machine to have clean gzip files for now.
I hope this subject help someone in the future.

(I know special characters in collection name is not a good practice but the framework used in this case does not give me the choice :slight_smile:)

Happy that you found your problem :smiley: !

Cheers,
Max.