I have been using MongoDB for quite some time now and was exploring many powerful operators available in the aggregation pipelines.
With the increasing capabilities of aggregation piplelines, a dilemma is created about whether to,
- Use complex aggregation pipelines so as to get the final data structure
- Query the data from MongoDB with minimal steps and then process the data in the application logic
For example, consider the below document,
{
"_id": 1024,
"data": [
{"foo": "one", "bar": "lorem"},
{"foo": "two", "bar": "ipsum"},
{"foo": "three", "bar": "asdf"},
{"foo": "four", "bar": "ljkls"},
{"foo": "five", "bar": "afsdfk"},
{"foo": "six", "bar": "lksflkj"},
...
]
}
After finding the document, I want to filter the sub-documents based on if the value of foo
is in an array of values, and then finally get an object with the value of foo
as key and bar
as value.
We can achieve this using the below aggregation query,
db.collection.aggregate([
{"$match": {"_id": 1024}},
{"$project": {
"data": {
"$map": {
"input": {
"$filter": {
"input": "$data",
"as": "sub",
"cond": {"$in": ["$$sub.foo", ["one", "three", "six", ...]]}
}
},
"as": "sub",
"in": {
"k": "$$sub.foo",
"v": "$$sub.baar"
}
}
}
}},
{'$project': {
"final_data": {"$arrayToObject": "$data"},
"_id": 0
}}
])
OR I can just get the document and process it in python (I am using pyMongo driver),
data = db.collection.find_one({"_id": 1024})['data']
final_data = {}
for sub_doc in data:
if sub_doc['foo'] in ['one', 'three', 'six', ...]:
final_data[sub_doc['foo']] = sub_doc['bar']
In the first approach, I need to know the mongo-db specific operators and syntax. But once the query is ready I can run it in shell or other languages with minimal changes.
But with the second, I am already comfortable with python or other programming languages which are designed to do such stuff. Going ahead, if I change the data source which doesn’t support such operators, I need not worry about migrating the logic back to the programming language.
How to decide between the both? Does the approach change based on the number of sub-documents, may be from 100 to 100 thousand ?