Archiving philosophy

I’m new to MongoDB, I’m working with .net core 2.2 and I’ve just added a library to my project to manage archives that, imho, are not kind to be managed with a sql db.

The problem I would like to solve with MongoDB is (this is just an example) to archive the successive positions of many users to create something like an history of their positions.

Users are identified by an ID, while the posiion is just a bunch of bytes (8) to be archived together with a date information (an int).

So mainly the strucure of the collection UserPositions I want to create is this:

    public class Position {
        public byte[] PositionInfo = new byte[8];
    }
    public class DatePositions: Dictionary <int, Position> {}
    public class UserPositions {
        public int UserId;
        public DatePositions UserPositions;
    }

I wonder if the creation of collections of UserPositions is the best way to use Mongo.

Consider that this information is updated once a day for all the users and it’s retrieved only for a single user per query.

A brilliant idea I had is to create a single collection for each user, a collection named Position_User_Id such that I can add a date/position to the collection without having to retrieve the entire UserPositions object, adding a new position and update it.
In this case I would have many position collections, in which the Position class is modified adding a date to the class:

    public class Position {
        public int Date;
        public byte[] PositionInfo = new byte[8];
    }

I apologize if this request is very basic, but after some reading (I’m not a very skilled programmer) I still wonder if I catched the philosophy of the MongoDB using.

Hi Leonardo,

Generally speaking you wouldn’t create a new document for each position when using Mongo, instead opting to update the existing document with the new position and a date value of when it occurred. Much like you would if you were just recording it on paper.

I’d suggest looking at changing your model slightly if possible to operate around the idea of one document per user and may positions in a document:

public class User
{
    // The unique user identity - Mongo can provide a unique Id but if you're linking back to SQL stick to an in
    public int UserId;

    // The users position data as a nested object, see class below for details
    public IEnumerable<PositionHistory> UserPositions;
}

// Supporting/Partial Class for position data
public class PositionHistory
{
     // The date time for this position
     public DateTime PositionDate;

     // Doesn't have to be a string, but basically whatever data identifies the position
     public string PositionName;
}

I think you’re still editing your question so this may not be relevant shortly, but the last update appears to have you getting on to the same idea, a single collection with the data inside it :wink:

MongoDb (and all/most NoSQL) databases tend to repeat data rather than try create relations between them as it’s considered that data storage is cheap compared to trying to build data in memory/cpu like SQL does

Hope this helps, let me know if you want me to clarify anything or if you have further questions

What @Will_Blackburn wrote is in line with what most will do in this situation. I found the following link very useful.

I do not think it is a good idea to multiply the number of collections.

1 Like

Infact, brilliant in this case was ironic. I’m trying to create an example on the basis of what @Will_Blackburn wrote, to be posted here, maybe it could help beginners like me :slight_smile:

Starting from what @Will_Blackburn wrote and (I hope) what @steevej tried to suggest me through the link of his latest post, I wrote the example code that I put in the repository GitHub - LeonardoDaga/MongoDbSample: My first example using MongoDb to create an archive containing a collection of documents, each document containing a list of information.

In this example, I’ve created a collection of User objects, each one identified by an ID and a List of positions, following the indications of Will. The idea is that each item of the collection (an User) has a list of positions that can be updated with new positions using the collection methods.

In this specific case, I wonder if using a List (it should be equivalent to using an Enumerable, I suppose) and adding an item to the list and replacing the previous using the code that follows is the correct approach. I’ve expected to use the “UpdateOne” method, but it’s not clear for me how I should use it.

The main code of the sample is the following:

    // get (and create if doesn't exist) a database from the mongoclient
    var db = mongo.GetDatabase("UserPositionsDb");

    // get a collection of User (and create if it doesn't exist)
    var collection = db.GetCollection<User>("UserCollection");

    var user = collection.AsQueryable()
        .SingleOrDefault(p => p.UserId == userID);

    bool newUser = false;
    if (user == null)
    {
        user = new User
        {
            UserId = userID,
            UserPositions = new List<PositionItem>()
        };

        newUser = true;
    }

    user.UserPositions.Add(positionItem);

    // Add the entered item to the collection
    if (newUser)
        collection.InsertOne(user);
    else
        collection.ReplaceOne(u => u.UserId == userID, user);

I know nothing about .net driver.

As a general comment you should try to avoid 2 round trips to the database in order to reduce latency or concurrency issue.

MongoDB has the concept of upsert, where the document is updated if it exists and inserted otherwise. Exactly what you are trying to do. You may look at some information at mongodb upsert - Google Search

For updating the array of position I think that https://docs.mongodb.com/manual/reference/operator/update/push/ will do the trick. In the shell and javascript I would do something like

use test ;
user = "steevej@gmail.com" ;
query = { _id : user } ;

location = { when : "a date time" , where : "home" } ;
update = 
{
	"$set" : { "_id" : user } ,
	"$push" :	{ "history" : location	}
} ;
upsert = { "upsert" : true } ;
db.position.updateOne( query , update , upsert ) ;

location = { when : "later" , where : "office" } ;
update = 
{
	"$set" : { "_id" : user } ,
	"$push" :	{ "history" : location	}
} ;
upsert = { "upsert" : true } ;
db.position.updateOne( query , update , upsert ) ;

Try the above and the run db.position.find().pretty() to see the result.

The C# API uses the ReplaceOne() method to do the upsert, you’ll need to pass in an extra parameter to make it work though:

collection.ReplaceOne(u => u.UserId == userId, user, new UpdateOptions {IsUpsert = true});

That single line can then replace everything in the bottom if (newUser) section and allows you to remove the query for looking up if the user exists…

// IEnumerable is the base of nearly all collections, so a List<T> is perfect here
var positions = new List<PositionItem>();

// ... Generate the list of positions ...
positions.Add(positionItem);

user = new User
{
    UserId = userId,
    UserPositions = positions,
};

// Now upsert to the document
collection.ReplaceOne(u => u.UserId == userId, user, new UpdateOptions {IsUpsert = true});

It’s been a while since I updated a nested item so I’ll need to check if this handles it, but from memory it should give you what you need while adhering to the good advice from @steevej regarding single trips to the database

Thank you Steve for the suggestion, very helpful. It opened me the world of the operators :slight_smile:
I’ve translated the code to the c# equivalent and it works pretty well. In the efficiency test I added in the repository cited above I’ve found that the Upsert time is 3 to 5 (depending of the size of the array) faster than the two round trips approach.
Just as help for beginners, I report the core of the instrucions used here:

    var positionItem = new PositionItem()
    {
        PositionDate = positionDay,
        Position = dataPosStr
    };

    var updatePositionFilter = Builders<User>.Update.Push(u => u.UserPositions, positionItem);

    collection.UpdateOne(u => u.UserId == userID, 
        updatePositionFilter,
        new UpdateOptions { IsUpsert = true });

Thank you Will for your reply.
I suppose @steevej 's answer is more viable because that way I don’t need to retrieve the whole collection of positions from the user before updating it.
This is my typical operating condition, when I add a new position I don’t know and I don’t care if other positions are already available for the user.

1 Like

Just to leave a note to the next beginners like me, I added another example (GitHub - LeonardoDaga/MongoDbSample: My first example using MongoDb to create an archive containing a collection of documents, each document containing a list of information, project MongoDbConsoleBulkWrite) to clarify how to use the bulk write operation (BulkWriteAsync) with the flag upsert true.
10 to 20 time faster than previous attempt, really the best way to insert multiple information at the same time.