The problem: We have documents stored in our collection which have nested objects, nested arrays of objects etc, and what we need to be able to do is within an aggregation pipeline
convert the values of the json
and json[]
fields into a sting . Ideally what we would like is for the $convert
or the $toString
operators to support input types of type object
and array
(of objects as well as primitives).
We can’t use the $accumulator
operator as we don’t have or want JavaScript turned on in our servers.
What we have done so far is to use a combination of $reduce
along with $concat
and $toString
to “manually” do a JSON.stringify
and have gotten quite far. The problem we are running into with this approach is we end up with trailing commas in the string. I tried to use $cond
to check if we are in the last item in the array but $reduce
does not seem to give us an iteration index we can use to determine if we are in the last item or not. Our test data looks as follows:
{
"_id": {
"$oid": "64c3934020a49e88d4b17f84"
},
"jsonArrayValues": [
{
"int": 1,
"double": 2.5,
"boolean": true,
"string": "Testing",
"objectId": {
"$oid": "61b0fdcbdee485f7c0582db6"
},
"date": {
"$date": "2022-04-08T00:00:00.000Z"
},
"arrayInt": [
1,
2,
3
],
"nested": {
"int": 3,
"string": "Testing 2"
},
"arrayObj": [
{
"int": 4,
"string": "Testing 3"
},
{
"int": 5,
"string": "Testing 4"
}
]
},
{
"int": 6,
"double": 3.5,
"boolean": false,
"string": "Testing 5",
"objectId": {
"$oid": "62b0fdcbdee485f7c0582db6"
},
"date": {
"$date": "2023-04-08T00:00:00.000Z"
},
"arrayInt": [
4,
5,
6
],
"nested": {
"int": 7,
"string": "Testing 6"
},
"arrayObj": [
{
"int": 8,
"string": "Testing 7"
},
{
"int": 9,
"string": "Testing 8"
}
]
}
],
"testId": "bugfix.schema-aware-queries.cast-json-array-to-varchar.case1"
}
The real problem with this approach is the trailing commas within the objects themselves. We are building up the pipeline itself programmatically and it will support multiple types of documents so the complexity goes up quickly if we try to $substr
all the instances where a trailing comma would be.
Our $project
step looks as follows:
We could have a mapping function server side that would transform the data after it comes out of the db, but this is library code which puts the onus on the consumer of the library to remember to call the map so not ideal.
Any advice or assistance would be appreciated!