I have a huge heterogenous collection( = different subclasses of data in the same collection, think ecommerce catalog type of data, where all products are in the same collection but with a hugely disjoint attribute set but with some common attributes inherited from a parent type).
There can be a few thousand such types (subclasses) each with about 50-60 attributes and they are pretty dynamic - new subtypes get added all the time. The documents can also be nested. The single collection contains a few hundred million documents.
Now, given this, the number “distinct” attributes that I need to index on the collection is pretty high a few hundred attributes perhaps and therefore that many number of indexes on the same collection approximately. With MongoDB’s 64 indexes limit, I cannot keep adding indexes ( even otherwise it’s not an approach that warms the cockles of my heart anyway). Any of these attributes are searchable and needs to be performant.
I have the following options
- Bite the bullet and split each class into it’s own collection. This will solve the primary problem, but introduce a extensibility issue. Newer types are added all the time and the last thing I want is to keep creating collections all the time - it’s very dynamic. So this is ruled out overall. Btw, this is one of the reaons why I went with a document store to start with.
- Secondary Indexes - Ingest data into elastic search and query ES first to retrieve document ids and retrieve actual data from Mongo. Would work, but to keep ES and Mongo in sync is another problem. Is there another way to build out a large set of secondary indexes to supplement the mongo indexes ?
- Cache - Get the queries to run against a cache that “papers” over mongo. Too much build and long term support
- Other workarounds such as “mapping” attributes. Create indexes on 64 attributes. This will be static but the incoming data with all it’s attributes will map to one of these 64 attributes and we’ll use this mapping information to interpret data when we read it. This ensures taht we don’t have to go beyond a fixed set of static “surrogate” attributes to index.
Thoughts ? How have you solved this problem?