Storing uint128 values and using them in $match aggregation stage

Victor_Mezrin · May 13, 2024, 7:09am

Hi

I have documents with uint128 fields
And I need to filter them in aggregation pipelines with $gte/$lte/$eq operators.

What are the options since MongoDB doesn’t support such big integers?
What option is the most efficient for collections with 100k+ records?

Example document

{
    _id: ObjectId,
    receipt: {
        type: "string"
    },
    details: {
        amount: 340282366920938463463374607431768211455
    }
}

So far I came up with several options:

Option 1

Store uint128 as string and use operator $toInt

[
  {
    $match: {
      "receipt.type": { $eq: "some type"  },
      $expr: {
        $gte: [{ $toInt: "$details.amount" }, 123456]
      }
    }
  }
]

It doesn’t work because operator $toInt throws exception Overflow for such big ints.

Option 2

Split uint128 field into several smaller uint fields.
Smth like this: 2 bytes in int32, 7 bytes in Long, 7 bytes in Long
(Unfortunately MongoDB’ Long doesn’t support unsigned integers, so we can’t split uint128 to two uint64).

So in the doc we have three fields for different parts of the bytes: amount_high, amount_medium, amount_low

In requests we have complex logic which:

splits filter value into 3 parts
$match first compares amount_high, then amount_medium, then amount_low

Cons:
Looks error-prone.
Requests are hard to build
If $match in aggregate pipeline contains many conditions - hard to build suitable multi-key indexes.

Option 3

Store uint128 as zero padded string.
And use regular operators $gte $lte.

uint128 max length is 39 digits
So in the doc we store uint128 as strings with 39 symbols:

340282366920938463463374607431768211455 becomes "340282366920938463463374607431768211455"
366920938463463374607431768211455 becomes "000000366920938463463374607431768211455"

In the requests we also convert uint to str and make zero-padded

[
  {
    $match: {
      "receipt.type": { $eq: "some type"  },
      "details.amount": { $gte: "000000000000000000000000000000000123456" }
    }
  }
]

Surprisingly this works.
Mongo Compass displays in Explain that IXSCAN is used - previously I created multi-key index { "receipt.type": 1, "details.amount": 1}
And I don’t see any errors in result set.

Is this the right way?
Any better option?