Hi everyone,
I’m slowly learning MongoDB with python with help from this site and other tutorials I’ve found online. I need help aggregating and counting my documents.
To explain: In my instance of MongoDB, I already have 1000s of documents, with each document tracking a car available from my (fictional) rental company. All the car documents have this format:
{
"model": "Honda Civic",
"license_plate": "ABC-1234",
"attributes":
{
"rented": "YES",
...lots more data here...
}
}
I’ve learned enough MongoDB/python to build simple pipelines that search the data. Here’s a pipeline that searches all documents, plucks out a car’s model, license plate, and “rented” status:
mydatabase = client.CARS_DB
mycollection = mydatabase.RENTAL_LOT_A
listOfRules = mycollection.distinct("model")
for rule in listOfRules:
match_variable = {
"$match": { 'model': rule }
}
project_variable = {
"$project": {
'_id': 0,
'model': 1,
'license_plate': 1,
'attributes.rented': 1
}
}
pipeline = [
match_variable,
project_variable
]
results = mycollection.aggregate(pipeline)
for r in results:
print(r)
print("- - - - - - - - - - - - - - - - -")
The output is:
{'model': 'Honda Civic', 'license_plate': 'ABC-1234', 'attributes': {'rented': 'YES'}}
- - - - - - - - - - - - - - - - -
{'model': 'Toyota Camry', 'license_plate': 'ABC-5678', 'attributes': {'rented': 'YES'}}
- - - - - - - - - - - - - - - - -
{'model': 'Honda Civic', 'license_plate': 'DEF-1001', 'attributes': {'rented': 'no'}}
- - - - - - - - - - - - - - - - -
So far, so good.
But here’s what’s vexing me: The above is great if I want all the cars listed individually. But say I want to see the bigger, aggregated picture. I don’t care about the license plate because what I want to see is the equivalent of this:
MODEL TOTAL
========================
Honda Civic 134
Toyota Camry 432
Ford Mustang 93
Honda Accord 738
Chevorlet Corvette 3
…where the value in the “TOTAL” column is the number of documents where “model” equaled “Honda Civic,” and so on. Better yet would be this:
MODEL TOTAL
=================================
Honda Civic, rented 76
Honda Civic, available 58
Toyota Camry, rented 245
Toyota Camry, available 187
Ford Mustang, rented 60
Ford Mustang, available 33
Honda Accord, rented 137
Honda Accord, available 601
Chevorlet Corvette, rented 3
Chevorlet Corvette, available 0
Now I’m aggregating on “model” and “attributes.rented”.
I don’t really care about the SQL-like table format, I just want to be able to pull this data out of MongoDB. There’s got to be a way to modify my pipeline, or create something new from scratch. I’ve tried python dictionaries, db.collection.countDocuments()
, and a number of other posts from this website; no luck. Can anyone suggest an approach? Thank you.