Freeze and release data from a collection and preserve _id field

Benjamin_Borngen-Schmidt · August 26, 2021, 2:23pm

Hello everyone,

I have a question regarding data modeling and I hope this is the right place to ask.

So the problem I’m having is, that I have a collection with data, that is a graph. The graph can be resolved using fields like children: [ObjectId, ...] and parent: ObjectId where as the ObjectIds are the _id field of the data itself.

Now I want to create “releases” of this data at a certain point in time, much like using git tag and currently I came up with the following solutions:

Copy the data in a new collection e.g. “release_2021-08-26”
Copy the data in a new collection for releases and add a field indicating the release date

The problem I have with the first Solution is, that I would create many collections and would need the application to query all collection names and deduct which releases exist or would need another collection where I store where in which collection the release would be found. What I like about this solution is, that I would not need to mess with the ObjectId field _id which is used for the mentioned relation ship between the nodes. Also I could re-use all the aggregations I already have and would just need to interchange the collection that is worked on.

The second solution has the problem, that as soon as I would create a second “release” I would get duplicate entries on the _id field. But I can not change the _id since this means I would need to update all data as well, which is doable, but nothing I’d like to do.
I thought about using a compound Id of {ObjectId, Date} but this means the connection fields children and parent now are not usable anymore

Also with the second solution I thought about using views the retransform the data, especially the _id field, to its original form, but there seems to be no possibility to use any parameters on views to just select a certain release dataset.

Is there any other way to release and freeze data? It would be nice to preserve the unique identifier of the object.

Thanks you

Asya_Kamsky · August 26, 2021, 8:10pm

Why do you need new documents for a new release? Why not just add another date into an existing document (i.e. have an array of dates indicating releases or some other tags)?

Benjamin_Borngen-Schmidt · August 27, 2021, 8:09am

Because the requirement is, that the application has to have a historical correct dataset at the time of releasing it. Just a date does not cut it in terms of preserving the original document at the time. Hence why I compared it to git tag

Also since development time is not infinite I wouldn’t want to change the data structure anymore