Deduplication - Embedded Vs Referenced Data model

Guru_Prashanth_Thana · September 14, 2021, 10:02am

Hi All

I have a document with list of files as shown below in embedded document. In the referenced document model shown below , the content of the files are in separate document and the main document refers to this files. The number of files is around 3000-4000. I have to create 200 copies of the same document in both cases. The copies of the embedded documents occupied 500MB whereas for referenced document it is 64MB. Is there a way to optimize the storage size in embedded document to deduplicated the repeated data?

Embedded document:

{
   "db":"test",
   "files":[
      {
         "absolutePath":"/x/release/data.txt"
      },
      {
         "absolutePath":"/x/release/temp.txt"
      }
   ]
}

Referenced Document:

{
   "db":"test",
   "files":[
      {
         "Id":BSON(Object("axdf56789d"))
      },
      {
         "Id":BSON(Object("axdf5678765")
      }
   ]
}