Wildcard Indexes and MultiTenant Deployments
Wildcard indexes are a welcome addition to anyone who has to manage data that's dynamic and isn't all known ahead of time. Imagine that you allow your end-user to add arbitrary fields, and also want to allow them to query on them. You cannot create a regular index without knowing the field names that will be queried, but wildcard indexes allow you to specify the part of a document that should have all of its fields indexed.
A simple example is a product catalog where you have attributes unknown in advance:
{ 
 name: "Name of Product",
 attr: {
 sku: 'SKU12345',
 size: "XL",
 color: "pink",
 price: NumberDecimal("35.99"),
 cat: ["Clothes", "Sale"],
 countryOfOrigin: "Canada"
 }
 }

If I want to allow indexed queries by any attribute, I can add a wildcard index {"attr.$**":1}
and now any query for {"attr.color":"pink"}
(or any other "attr"
subfield) will be able to use this index.
But what if you are hosting data for many different tenants, and every single document has a tenant id in it (tid
) and when queries arrive you must search only that tenant's documents? Maybe you just add "tid":27"
to every query where 27 is the id
of the tenant the query came from?
{ 
 tid: 27,
 name: "Name of Product",
 attr: {
 sku: 'SKU12345',
 size: "XL",
 color: "pink",
 price: NumberDecimal("35.99"),
 cat: ["Clothes", "Sale"],
 countryOfOrigin: "Canada"
 }
 }

The full revised query would then be {"tid":27, "attr.color":"pink"}
. My first thought was that unfortunately you wouldn't be able to make use of the wildcard index because in 4.2 there is no support for compound wildcard indexes -- you cannot create an index {"tid":1, "attr.$**":1}
and that means your queries would have to choose whether to use an index for "tid"
or "attr.$**"
index but not both, and chances are each of them alone wouldn't be very selective.
But then I had an idea - what if you make the tenant id part of the subdocument that stores the fields you don't know up front but want to index? The sample document above could become:
{ 
 tid: 27,
 name: "Name of Product",
 attr: {
 tid27: {
 sku: 'SKU12345',
 size: "XL",
 color: "pink",
 price: NumberDecimal("35.99"),
 cat: ["Clothes", "Sale"],
 countryOfOrigin: "Canada"
 }
 }
 }

The revised query would then be {"attr.tid27.color":"pink"}
- but will it use the index "attr.$**"
? As it turns out, yes it will! Not only that, but it can use it in a much more powerful manner than if we had structured our attributes as an array of key-value pairs.
Consider this query:
db.products.find({"attr.tid27.price":{$lte:50}}).sort({"attr.tid27.price":-1}).limit(10)

This type of query which sorts by the attribute you're filtering on requires an in-memory sort with the key-value array of attributes but with wildcard index, it is able to read the data from the index already ordered. In fact, if you only wanted to return the price and not the rest of the document, it would be a covered index query (meaning the document would not need to be fetched at all).
The same applies to count queries which are able to use the super-fast special "count scan" plan when used with wildcard index, but with array of key-value objects require an index scan and a fetch of the document to get the count.
Give wildcard indexes a try and see how they make your application more efficient!