Best practice for modeling collections for the blog application

Hi MongoDB Community,

I am posting this topic to hear your opinion and suggestions for improvements of the models that I already have. I have a lot of experience working with MongoDB, and I tried to apply all the best practices that I know of, but since the technology always evolves (and especially MongoDB) maybe I can improve my models even more.

I am building a blog application (where I will also write about the MongoDB of course), and I have the following models so far:


Users:
I am using here extended reference model for the followed_tags so I don’t have to fetch all the tags on the User Profile page. Also I am creating indexes for email_lowercase and google_id so I can quickly authenticate users, and I am creating index for author_url field so I can quickly find information for Authors page.

const userSchema = new Schema(
  {
    name: { type: String, required: [true, 'Name is required'] },
    email: { type: String, required: [true, 'Email is required'] },
    email_lowercase: { type: String, required: true, index: true },
    linkedin: { type: String },
    stackoverflow: { type: String },
    github: { type: String },
    twitter: { type: String },
    website: { type: String },
    company: { type: String },
    company_website: { type: String },
    author_url: {
      type: String,
      required: [true, 'Display URL is required.'],
      unique: [true, 'Display URL has to be unique.'],
      index: true,
    },,
    image_url: { type: String },
    author_description: { type: String },
    google_id: { type: String, index: true },
    role: { type: String, default: 'USER' },
    last_visited: { type: Date, default: Date.now() },
    reset_password_token: { type: String, default: '' },
    salt: { type: String, select: false },
    hash: { type: String, select: false },
    number_of_articles: { type: Number },
    newsletter: { type: Boolean, default: false },
    followed_tags: {
      default: [],
      type: [
        {
          tag: { type: Schema.Types.ObjectId, ref: 'Tags' },
          name: { type: String },
          description: { type: String },
          display_url: { type: String },
        },
      ],
    },
  },
  {
    timestamps: {
      createdAt: 'created_at',
      updatedAt: 'updated_at',
    },
  }
);

Articles:
I am using here extended reference model for the author and tags so I don’t have to fetch all author’s and tags data in the Article page. Also I am creating index for email_lowercase and google_id so I can quickly authenticate users, and I am creating index for display_url field so I can quickly find information for Article page. Also, I am keeping number of number_of_likes and number_of_comments in the Article document because of the page where I will display all of the articles in small cards, so I will have that info right there in the document.

const articleSchema = new Schema(
  {
    author: {
      required: [true, 'Author is required.'],
      type: {
        _id: { type: Schema.Types.ObjectId, ref: 'Users', required: [true, 'Author _id is required'] },
        name: { type: String, required: [true, 'Author name is required.'] },
        image_url: { type: String },
        author_url: { type: String, required: [true, 'Author URL is required.'] },
      },
    },
    title: { type: String, required: [true, 'Title is required.'] },
    description: { type: String, required: [true, 'Description is required.'] },
    content: { type: String, required: [true, 'Content is required.'] },
    display_url: {
      type: String,
      required: [true, 'Display URL is required.'],
      unique: [true, 'Display URL has to be unique.'],
      index: true,
    },
    image_url: { type: String },
    tags: {
      default: [],
      type: [
        {
          _id: { type: Schema.Types.ObjectId, ref: 'Tags' },
          name: { type: String },
          description: { type: String },
          display_url: { type: String },
        },
      ],
    },
    number_of_likes: { type: Number, default: 0 },
    number_of_comments: { type: Number, default: 0 },
  },
  {
    timestamps: {
      createdAt: 'created_at',
      updatedAt: 'updated_at',
    },
  }
);

Comments:
I am using here extended reference model for the user so I can display all comments without fetching user data from Users collection. I also created index for article so I can quickly fetch all comments related to an article.

const commentSchema = new Schema(
  {
    article: { type: Schema.Types.ObjectId, ref: 'Articles', required: [true, 'Article is required'], index: true },
    user: {
      required: [true, 'User is required.'],
      type: {
        _id: { type: Schema.Types.ObjectId, ref: 'Users', required: [true, 'User _id is required'] },
        name: { type: String, required: [true, 'User name is required'] },
        image_url: { type: String },
      },
    },
    content: { type: String, required: [true, 'Content is required'] },
  },
  {
    timestamps: {
      createdAt: 'created_at',
      updatedAt: 'updated_at',
    },
  }
);

Likes:
I am using here extended reference model for the user so I can display all likes without fetching user data from Users collection. I also created index for article so I can quickly fetch all likes related to an article.

const likeSchema = new Schema(
  {
    article: { type: Schema.Types.ObjectId, ref: 'Articles', required: [true, 'Article is required'], index: true },
    user: {
      required: [true, 'User is required.'],
      type: {
        _id: { type: Schema.Types.ObjectId, ref: 'Users', required: [true, 'User _id is required'] },
        name: { type: String, required: [true, 'User name is required'] },
        image_url: { type: String },
      },
    },
  },
  {
    timestamps: {
      createdAt: 'created_at',
      updatedAt: 'updated_at',
    },
  }
);

Tags:
I created index for display_url so I can quickly find information for Tag page.

const tagSchema = new Schema(
  {
    name: {
      type: String,
      required: [true, 'Name is required'],
      unique: [true, 'Name has to be unique'],
    },
    description: { type: String, required: [true, 'Description is required'] },
    display_url: { type: String, required: [true, 'Display URL is required'], index: true },
    image_url: { type: String },
    documentation_url: { type: String },
    number_of_articles: { type: Number, default: 0 },
  },
  {
    timestamps: true,
  }
);

All suggestions are welcomed!

Kind regards,
Nenad

Hello @NeNaD, There is a lot of details to browse thru, and I can comment on couple of things.

In the userSchema there are two fields, email and the email_lower_case. You can use Collation to search in a case-insensitive way, and avoid two fields. The second point is that you can use grouping of data - for example, github, stackoverflow, twitter, linkdin, etc., can be in group field - a sub-document. It is for readability, maintenance and grouping by certain functionality.

You already know about relationships between entities, embedding and referencing, and that you model the data based upon the application requirements. The most important queries / functionality should be an important criteria in the data design.

1 Like

Hi @Prasad_Saya,

Thanks for the feedback! :smile:

When it comes to the email and email_lower_case fields, I didn’t consider collation because I though it’s used only for multi-language feature, isn’t it? I though of doing it with case insensitive regex search but then I need to worry about the special characters and also I though that regex is more time consuming than just string matching from the index.

When it comes to grouping of the github, stackoverflow, twitter, linkedin fields I think it’s a great idea and I will definitely group them. Thanks!

Did you have time to check other collections? Do you have any other feedback? :smile:

The Collation’s strength parameter has a default value of 3; when it is set to 2, the searches are case-insensitive. You can also take advantage that Collation can be specified on the index defined on the search field; see Collation and Index Use.