Efficient implementation of bit screening or a bloom filter

I have 1 million documents each with 256 bytes of bit flags (so 2k bits) used as a custom search filter in a binary field. Running a search for all records with certain bits set (essentially a bitwise operation on all records) is slow, taking tens of seconds compared to a fraction of a second with a relational database. I am wondering if there is a good way of implemeting this kind of data and query in Mongo?

The key operation is in algebraic form is as follows - a check that the field in the document has at least all the bits set that are in the query (and it may also have many more set).

query & = query

I tried adding an index on the bitwise field, as that should allow this to be an index scan but mongodb doesn’t use the index for bitwise operations - at least it didn’t last time I tried, I haven’t rerun my tests with the latest versions.

Some background for those wondering why one would implement something like this. The documents contain complex scientific graph data which needs to be sub-graph matched, which is an NP problem. The bits each represent complex composite properties of the nodes and edges within the graph data, addressing common cases of the underlying NP problem giving a speed up of several orders of magnitude. Filtering on the bits is used as a pre-filter for fetching documents to avoid fetching the entire database.
Fetching everything will also be a not uncommon use case which I assume would be better addressed by adding caching but that is a separate issue. The result is similar to a bloom filter.

Hi @Matthew_Towler,

Bitwise operations cannot utelize indexes well but other parts of the query can.

Maybe consider implement a document sagregation in a way that you can fast filter the needed bitmaps to be queried based on other indicate fields and run bitwise operators on the filter data:

For example store the bits in array rather than single tiny document s

Best
Pavel

Thanks of the suggestions. In this case the use of the bitscreens is the entire query, so pre-filtering additional fields won’t really help in this case.

The bits are already in an array of ten integers, the bits being checked using a condition of the following form (apologies if the quotes are not exactly correct, this is a partial copy from some python client code).

"$and": [
    { "screensWords.0": { "$bitsAllSet":  [14, 15, 16, 29, 30] } },
    { "screensWords.1": { "$bitsAllSet":  [0, 3, 4, 7, 11, 12, 13, 26, 27, 28, 29, 30, 31] } },
    { "screensWords.2": { "$bitsAllSet":  [0, 1, 19, 20, 21, 23, 24, 25, 29, 30, 31] } },
    { "screensWords.3": { "$bitsAllSet":  [2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14, 18] } },
    { "screensWords.4": { "$bitsAllSet":  [5, 8, 14, 16, 17, 25, 26, 27, 28, 29, 31] } },
    { "screensWords.5": { "$bitsAllSet":  [0, 17, 20, 22, 23, 25, 26, 27] } },
    { "screensWords.6": { "$bitsAllSet":  [15, 18, 20, 21, 28] } },
    { "screensWords.7": { "$bitsAllSet":  [1, 3, 9] } },
    { "screensWords.9": { "$bitsAllSet":  [0, 3, 4, 5, 6, 7, 11, 18, 19, 20, 21, 22, 23, 30, 31] } } ]