How to insert documents and have a reference from "data" documents to a "metadata" document

I’m new to MongoDB and have taken the M001 course but not sure how to perform the following:
I have a large number of json files that are “metadata” and “data”. The data documents are a many to one relationship to the metadata, ie one metadata document is the same for many data documents. How to I insert the metadata document, then all of the data documents with a reference to its respective metadata document? I suspect the best reference to use in the data document is the metadata’s _id, but not sure how to insert the metadata document, get it’s _id and then add that to the data documents as a field prior to the insert. I should also mention that I would put the metadata and data documents in different collections. I have also found a mongodb tool called mongoimport that may be useful to do this, but suspect I will have to insert all of these files via a program in javascript or python.
Any thoughts on this and the best approach would be appreciated.

Hi @Leon_Werenka ,

Usually we will see a structure such as :

//Metadata collection
{
   _id : "abc" ,
   metadataField1 : 
  ...
}

// Data collection
{
 _id : "111",
 metadataId : "abc",
 dataField1 : ... ,
  ...
}

{
 _id : "222",
 metadataId : "abc",
 dataField1 : ... ,
  ...
}
  

This will form a one to many relationship between the collections , you can still add additional data to the metadata collection like “numberOfDataDocs” and maintain it over time.

Now the import question can be done in many ways. You can write script or code that first insert the metadata gets the ids needed and embed it in the data docs when they are inserted.

If you use an import tool that cannot perform this logic easily then either have the ids pre set in the loaded documents or load into a staging collection and use $merge aggregation to populate them

Thanks
Pavel

Thanks for the comments. This approach does solve the problem by overriding the default GUID-type assigning of the _id field. I guess that is okay, but one would then have to be careful of not re-using an _id. Or one could generate GUID id’s using the scripting language.
When using the database generated _id, for finding the metadata _id, one would probably have to have another field like “new”: true to find the newest metadata document submitted, get the _id, update the “new”: false, then insert the data documents with the _id.
Is there an update or insert function that could return the _id as part of the return value by specifying something like {_id = 1} or can this only be done with find() functions?

Hi @Leon_Werenka ,

You don’t have to use _id to perform as your primary key. You can have a metadataId and timestamp being the primary key.

Then you can index {metadataId : 1, timestamp: -1}

This will allow you to get findOne when sorted by timestamp : -1

We do have a findAndUpdateOne method thqt can find a document and also update it :

db.metadata.findOneAndUpdate({metadataId : "abc"},{$set : {timestamp : new Date()} }).sort({timestamp : -1})

This will always get the latest version.

Thanks
Pavel

1 Like