Hi
I have documents with uint128
fields
And I need to filter them in aggregation pipelines with $gte
/$lte
/$eq
operators.
What are the options since MongoDB doesn’t support such big integers?
What option is the most efficient for collections with 100k+ records?
Example document
{
_id: ObjectId,
receipt: {
type: "string"
},
details: {
amount: 340282366920938463463374607431768211455
}
}
So far I came up with several options:
Option 1
Store uint128
as string and use operator $toInt
[
{
$match: {
"receipt.type": { $eq: "some type" },
$expr: {
$gte: [{ $toInt: "$details.amount" }, 123456]
}
}
}
]
It doesn’t work because operator $toInt
throws exception Overflow for such big ints.
Option 2
Split uint128
field into several smaller uint fields.
Smth like this: 2 bytes in int32, 7 bytes in Long, 7 bytes in Long
(Unfortunately MongoDB’ Long doesn’t support unsigned integers, so we can’t split uint128 to two uint64).
So in the doc we have three fields for different parts of the bytes: amount_high, amount_medium, amount_low
In requests we have complex logic which:
- splits filter value into 3 parts
- $match first compares amount_high, then amount_medium, then amount_low
Cons:
Looks error-prone.
Requests are hard to build
If $match
in aggregate pipeline contains many conditions - hard to build suitable multi-key indexes.
Option 3
Store uint128 as zero padded string.
And use regular operators $gte $lte.
uint128 max length is 39 digits
So in the doc we store uint128 as strings with 39 symbols:
340282366920938463463374607431768211455
becomes"340282366920938463463374607431768211455"
366920938463463374607431768211455
becomes"000000366920938463463374607431768211455"
In the requests we also convert uint to str and make zero-padded
[
{
$match: {
"receipt.type": { $eq: "some type" },
"details.amount": { $gte: "000000000000000000000000000000000123456" }
}
}
]
Surprisingly this works.
Mongo Compass displays in Explain that IXSCAN is used - previously I created multi-key index { "receipt.type": 1, "details.amount": 1}
And I don’t see any errors in result set.
Is this the right way?
Any better option?