Modify MongoDB/python Pipeline to Aggregate Documents by Field?

redapplesonly · June 15, 2022, 2:21pm

Hi everyone,

I’m slowly learning MongoDB with python with help from this site and other tutorials I’ve found online. I need help aggregating and counting my documents.

To explain: In my instance of MongoDB, I already have 1000s of documents, with each document tracking a car available from my (fictional) rental company. All the car documents have this format:

{
	"model": "Honda Civic",
	"license_plate": "ABC-1234",
	"attributes":
		{
            "rented": "YES",
			...lots more data here...
		}
}

I’ve learned enough MongoDB/python to build simple pipelines that search the data. Here’s a pipeline that searches all documents, plucks out a car’s model, license plate, and “rented” status:

    mydatabase = client.CARS_DB
    mycollection = mydatabase.RENTAL_LOT_A

    listOfRules = mycollection.distinct("model")

    for rule in listOfRules:

    	match_variable = {
          "$match": { 'model': rule }
    	}
    	project_variable = {
          "$project": {
        	'_id': 0,
        	'model': 1,
		'license_plate': 1,
        	'attributes.rented': 1
          }
    	}
    	pipeline = [
          match_variable,
          project_variable
    	]
    	results = mycollection.aggregate(pipeline)
    	for r in results:
          print(r)
          print("- - - - - - - - - - - - - - - - -")

The output is:

{'model': 'Honda Civic', 'license_plate': 'ABC-1234', 'attributes': {'rented': 'YES'}}
- - - - - - - - - - - - - - - - -
{'model': 'Toyota Camry', 'license_plate': 'ABC-5678', 'attributes': {'rented': 'YES'}}
- - - - - - - - - - - - - - - - -
{'model': 'Honda Civic', 'license_plate': 'DEF-1001',  'attributes': {'rented': 'no'}}
- - - - - - - - - - - - - - - - -

So far, so good.

But here’s what’s vexing me: The above is great if I want all the cars listed individually. But say I want to see the bigger, aggregated picture. I don’t care about the license plate because what I want to see is the equivalent of this:

MODEL              TOTAL
========================
Honda Civic         134
Toyota Camry        432
Ford Mustang         93
Honda Accord        738
Chevorlet Corvette    3

…where the value in the “TOTAL” column is the number of documents where “model” equaled “Honda Civic,” and so on. Better yet would be this:

MODEL                       TOTAL
=================================
Honda Civic, rented            76
Honda Civic, available         58
Toyota Camry, rented          245
Toyota Camry, available       187
Ford Mustang, rented           60
Ford Mustang, available        33
Honda Accord, rented          137
Honda Accord, available       601
Chevorlet Corvette, rented      3
Chevorlet Corvette, available   0

Now I’m aggregating on “model” and “attributes.rented”.

I don’t really care about the SQL-like table format, I just want to be able to pull this data out of MongoDB. There’s got to be a way to modify my pipeline, or create something new from scratch. I’ve tried python dictionaries, db.collection.countDocuments(), and a number of other posts from this website; no luck. Can anyone suggest an approach? Thank you.

steevej · June 16, 2022, 4:58pm

You need to use the $sum accumulator inside a $group stage.

Your _id will be an object with $model and $attributes.rented.

system · June 21, 2022, 11:07pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.