How do I aggregate by substring (email address domains)

Hey guys,

I am having some trouble building a script that gets the count of domain addresses of my user base, how would I do that?

{id:1, email:‘test@gmail.com’}
{id:2, email:‘test2@gmail.com’}
{id:3, email:‘test3@example.com’}

I kinda need to aggregate down enough so that it shows
{gmail.com:2, example.com:1}

Been googling a lot, find one for mapreduce but it’s already been deprecated.

You can do it like this:

  • $split - to split email with “@” as delimiter. This will return an array of two items.
  • $arrayElemAt and $set - to fetch second item from above array and put its value to domain property.
  • $group with $sum - to group and count based on new property domain.
db.collection.aggregate([
  {
    "$set": {
      "domain": {
        "$arrayElemAt": [
          {
            "$split": [
              "$email",
              "@"
            ]
          },
          1
        ]
      }
    }
  },
  {
    "$group": {
      "_id": "$domain",
      "count": {
        "$sum": 1
      }
    }
  }
])

Working example

1 Like