Scaling a data model using bloom filters

Lukas_Smilek · February 16, 2025, 2:37pm

Hi John,

I ended up using RoaringBitmap and am sending to MongoDB (through Parse Server) only a RoaringBitmap data in base64 format. I do not hash them after. Swift code on client:

func roaringBitmapData(from bitmap: RoaringBitmap) -> Data {
        let size = bitmap.portableSizeInBytes()
        var buffer = [Int8](repeating: 0, count: size)
        _ = bitmap.portableSerialize(buffer: &buffer)
        let uintBuffer = buffer.map { UInt8(bitPattern: $0) }
        return Data(uintBuffer)
    }

then I use Parse Swift SDK for upload:

ParseBytes(data: bloomData)

For querying profiles only once later I ended up using a skip list plugin in ElasticSearch that I run in parallel to Parse Server. Nevertheless I am currently revisiting that decision and trying to build pure Parse Server solution. Implementation of skip list is relatively straight forward but expensive I believe:

Generate full bloom filter on server start up an cache it → Array of all available bits/numbers [1, 2, …, 262144]
Then when creating query, this array is copied and all “used” bits/numbers in saved RoaringBitmap are removed from this copied array → faster generation of reduced array than regenerating a new one every time
Such reduced array of “available bits” is then used for ContainedIn indexes query