GridFS-Can we use aggregation query from GridFS specification

Sundar_Koduru · March 5, 2023, 11:18am

We are storing JSON Document that is greater than 16MB and we storing it using GridFS into collections(fs.chunks, fs.files).Below is my sample json.I want filter,sort and limit based on the fields in address array.Can you please point me to an example .Thank you in advance.

[{
“name”: “John”,
“age”: 25,
“address”: [
{
“street”: “123 Main St”,
“city”: “Anytown”,
“state”: “CA”
},
{
“street”: “456 Oak Ave”,
“city”: “Someville”,
“state”: “NY”
}
]
},
{
“name”: “Doe”,
“age”: 26,
“address”: [
{
“street”: “sdfs sdf St”,
“city”: “sdf”,
“state”: “sdf”
},
{
“street”: “sdfsd sdf sdf”,
“city”: “dfs”,
“state”: “dsfs”
}
]
},{
“name”: “abc”,
“age”: 29,
“address”: [
{
“street”: “abc Main St1”,
“city”: “xyz”,
“state”: “CA12”
},
{
“street”: “abcsdsd”,
“city”: “addfsd”,
“state”: “sdfsd”
}
]
}]

steevej · March 6, 2023, 12:48pm

You simply cannot do that if you store your JSON documents as a file in GridFS.

Why do you do that?

The 16MB limits apply to a single document. In what you share you have 3 very small documents.

is one document.

Is another document.

Sundar_Koduru · March 6, 2023, 2:33pm

Thank you very much,This is just example JSON document,my JSON document is >16MB and mongodb throws error as ‘It exceeded the size of 16MB’.But unfortunately we have mix of documents for the same collection that are mostly less than 16MB and very few greater than 16MB and structure has embedded documents with arrays.We have to use aggregation queries for unwind,filter,sort and limit etc. And now the problem in question is that if >16MB documents stored using GridFS is there a way that can queried using Aggregation,as we do for normal collections or any other better solution could help.
Thank you in advance.

steevej · March 7, 2023, 2:00am

No you cannot use aggregation on the content of the files stored using GridFS.

May be you can split oversized document into smaller related documents using something like the Extended Reference Pattern.

Something like jq may help you split an oversized JSON into a set of smaller related documents.

Using your sample documents, you could split a document like

into 2 smaller documents like

core_collection :
[
    {    "_id" : 369 ,
         "name": "John",
         "age": 25 }
]

address_collection :
[
    {   "core_id" : 369 ,
        "address" : [
            {    “street”: “123 Main St”,
                 “city”: “Anytown”,
                 “state”: “CA” } ,
            {    “street”: “456 Oak Ave”,
                 “city”: “Someville”,
                 “state”: “NY” }
        ]
    }
}

If it is still too big you may make each entry of address into a top level document, looks a lot like $unwind.

In some cases, plain old normalization, is still a valid solution.

While it is best to keep together the things that are accessed together, sometimes you have no choice.