Data Model
Consider a hypothetical sports club with a database that contains a
users collection that tracks the user's join dates, sport preferences,
and stores these data in documents that resemble the following:
{ _id : "jane", joined : ISODate("2011-03-02"), likes : ["golf", "racquetball"] } { _id : "joe", joined : ISODate("2012-07-02"), likes : ["tennis", "golf", "swimming"] }
Normalize and Sort Documents
The following operation returns user names in upper case and in
alphabetical order. The aggregation includes user names for all documents in
the users collection. You might do this to normalize user names for
processing.
db.users.aggregate( [ { $project : { name:{$toUpper:"$_id"} , _id:0 } }, { $sort : { name : 1 } } ] )
All documents from the users collection pass through the
pipeline, which consists of the following operations:
The results of the aggregation would resemble the following:
{ "name" : "JANE" }, { "name" : "JILL" }, { "name" : "JOE" }
The following aggregation operation returns user names sorted by the month they joined. This kind of aggregation could help generate membership renewal notices.
db.users.aggregate( [ { $project : { month_joined : { $month : "$joined" }, name : "$_id", _id : 0 } }, { $sort : { month_joined : 1 } } ] )
The pipeline passes all documents in the users collection through
the following operations:
The
$projectoperator:Creates two new fields:
month_joinedandname.Suppresses the
idfrom the results. Theaggregate()method includes the_id, unless explicitly suppressed.
The
$monthoperator converts the values of thejoinedfield to integer representations of the month. Then the$projectoperator assigns those values to themonth_joinedfield.The
$sortoperator sorts the results by themonth_joinedfield.
The operation returns results that resemble the following:
{ "month_joined" : 1, "name" : "ruth" }, { "month_joined" : 1, "name" : "harold" }, { "month_joined" : 1, "name" : "kate" } { "month_joined" : 2, "name" : "jill" }
The following operation shows how many people joined each month of the year. You might use this aggregated data for recruiting and marketing strategies.
db.users.aggregate( [ { $project : { month_joined : { $month : "$joined" } } } , { $group : { _id : {month_joined:"$month_joined"} , number : { $sum : 1 } } }, { $sort : { "_id.month_joined" : 1 } } ] )
The pipeline passes all documents in the users collection through
the following operations:
The
$projectoperator creates a new field calledmonth_joined.The
$monthoperator converts the values of thejoinedfield to integer representations of the month. Then the$projectoperator assigns the values to themonth_joinedfield.The
$groupoperator collects all documents with a givenmonth_joinedvalue and counts how many documents there are for that value. Specifically, for each unique value,$groupcreates a new "per-month" document with two fields:_id, which contains a nested document with themonth_joinedfield and its value.number, which is a generated field. The$sumoperator increments this field by 1 for every document containing the givenmonth_joinedvalue.
The
$sortoperator sorts the documents created by$groupaccording to the contents of themonth_joinedfield.
The result of this aggregation operation would resemble the following:
{ "_id" : { "month_joined" : 1 }, "number" : 3 }, { "_id" : { "month_joined" : 2 }, "number" : 9 }, { "_id" : { "month_joined" : 3 }, "number" : 5 }
The following aggregation collects top five most "liked" activities in the data set. This type of analysis could help inform planning and future development.
db.users.aggregate( [ { $unwind : "$likes" }, { $group : { _id : "$likes" , number : { $sum : 1 } } }, { $sort : { number : -1 } }, { $limit : 5 } ] )
The pipeline begins with all documents in the users collection,
and passes these documents through the following operations:
The
$unwindoperator separates each value in thelikesarray, and creates a new version of the source document for every element in the array.Example
Given the following document from the
userscollection:{ _id : "jane", joined : ISODate("2011-03-02"), likes : ["golf", "racquetball"] } The
$unwindoperator would create the following documents:{ _id : "jane", joined : ISODate("2011-03-02"), likes : "golf" } { _id : "jane", joined : ISODate("2011-03-02"), likes : "racquetball" } The
$groupoperator collects all documents with the same value for thelikesfield and counts each grouping. With this information,$groupcreates a new document with two fields:_id, which contains thelikesvalue.number, which is a generated field. The$sumoperator increments this field by 1 for every document containing the givenlikesvalue.
The
$sortoperator sorts these documents by thenumberfield in reverse order.The
$limitoperator only includes the first 5 result documents.
The results of aggregation would resemble the following:
{ "_id" : "golf", "number" : 33 }, { "_id" : "racquetball", "number" : 31 }, { "_id" : "swimming", "number" : 24 }, { "_id" : "handball", "number" : 19 }, { "_id" : "tennis", "number" : 18 }