The collection has the following structure:
[
{
"id": "173420",
"dataset_name": "173420 - gene expression in treatment"
"file_details": [
{
"size": 31983,
"data_type": "Methylation"
},
{
"size": 110193,
"data_type": "Methylation"
},
{
"size": 254763,
"data_type": "Methylation"
},
{
"size": 1632726,
"data_type": "Clinical",
}
]
},
{
"id": "GSE88",
"dataset_name": "GSE88 tumour recurrence prediction",
"file_details": [
{
"size": 7402964,
"data_type": "Expression"
},
{
"size": 7368643,
"data_type": "Expression"
},
{
"size": 7540211,
"data_type": "Clinical"
},
{
"size": 7426688,
"data_type": "Clinical"
}
]
}
]
The requirement is to count the distinct file_details.data_type and count the number of documents that have the data_type in it.
I have used the following unwind-group-count query for it:
db.dataset.aggregate([
{ $unwind: "$file_details" },
{"$group" : {_id:"$file_details.data_type", count:{$sum:1}}},
{$sort : {count:-1}}
])
It returns the following result:
_id: “Methylation”
count: “3”
-id:“Expression”
count: “2”
_id: “Clinical”
count: “3”
But the expected result needs to count the occurrence of the data_type per document:
expected output:
_id: “Methylation”
count: “1”
-id:“Expression”
count: “1”
_id: “Clinical”
count: “2”
could you please help me improve the query or any new approch.
Thank you in advance.