Hi John,
I ended up using RoaringBitmap and am sending to MongoDB (through Parse Server) only a RoaringBitmap data in base64 format. I do not hash them after. Swift code on client:
func roaringBitmapData(from bitmap: RoaringBitmap) -> Data {
let size = bitmap.portableSizeInBytes()
var buffer = [Int8](repeating: 0, count: size)
_ = bitmap.portableSerialize(buffer: &buffer)
let uintBuffer = buffer.map { UInt8(bitPattern: $0) }
return Data(uintBuffer)
}
then I use Parse Swift SDK for upload:
ParseBytes(data: bloomData)
For querying profiles only once later I ended up using a skip list plugin in ElasticSearch that I run in parallel to Parse Server. Nevertheless I am currently revisiting that decision and trying to build pure Parse Server solution. Implementation of skip list is relatively straight forward but expensive I believe:
- Generate full bloom filter on server start up an cache it → Array of all available bits/numbers [1, 2, …, 262144]
- Then when creating query, this array is copied and all “used” bits/numbers in saved RoaringBitmap are removed from this copied array → faster generation of reduced array than regenerating a new one every time
- Such reduced array of “available bits” is then used for ContainedIn indexes query