Suggestions for better schema design for searching names

yannick · March 18, 2022, 11:58am

Hi,

I’m making an application for therapists, where each therapist has its own patients.

The therapist should be able to search for the firstName and/or lastName.

I store and search the names “normalized” and “inverted”.
“Jérôme Müller” becomes “jerome muller” and “muller jerome”

therapistSchema = {
  id: "abc123",
  // more...
}

patientScheam = {
  firstName: "Jérôme",
  lastName: "Müller",
  owner: "abc123",
  search: [
    // (the search array is indexed)
    "abc123_jerome muller",
    "abc123_muller jerome"
  ],
  // more...
}

I query it with Patien.find( { search: /^abc123_SEARCH/ } ) where SEARCH is the string the therapists is searing for. It queries efficient using the search_1 index.

I’m very happy with it but I would love to know if someone else has an ever better solution or if there is a “native MongoDB” way to get the same result?

Thanks!

Prasad_Saya · March 18, 2022, 2:24pm

Hello @yannick, Welcome to the MongoDB Community forum!

To start with I have couple of suggestions.

1. A Variation:

patientSchema =  {
  firstName: "Jérôme",
  lastName: "Müller",
  owner: "abc123",
  search: [
    "jerome muller",
    "muller jerome"
  ],
}

Define a Compound Index as: { owner: 1, search: 1 }
The query can be: find( { owner: 'abc-123', search: /^SEARCH/ } )

2. Using Collation

Collation allows users to specify language-specific rules for string comparison, such as rules for lettercase and accent marks.

You can specify collation for a collection or a view, an index, or specific operations that support collation.

For example, the following query will find the document you had posted: { "firstName" : "Jérôme", "lastName" : "Müller", ... }:

find( { lastName: "muller" }).collation( { locale: 'en', strength: 1 } )

Collation can be specified in a Collation Document based upon your use case. In the above query the collation document is: { locale: 'en', strength: 1 } . Collation document has various fields, e.g., locale, strength, etc. The locale field is mandatory and rest are optional. In the example, strength field allowed - "Collation performs comparisons of the base characters only, ignoring other differences such as diacritics and case".

It is possible that Collation can be applied in your use case - but, I dont know exactly how to implement right now. This can be explored further.

Also, refer Collation Concepts at:

yannick · March 18, 2022, 3:13pm

Thanks @Prasad_Saya,

find({ owner: 'abc123', search: /^SEARCH/ }) works great with the index { owner: 1, search: 1 }, but only if I explicitly use .hint({ owner: 1, search: 1 }).
Do you think it will automatically use “owner_1_search_1” once I have more data? Or should I just use “hint()”?

With collation it works as well, but the result is not exactly what I need.
“Müller” gets only found with “müller” and “muller” but not with “mueller”.
And it doesn’t allow searching for /^mull/. (I believe?)

Prasad_Saya · March 18, 2022, 3:26pm

You probably has multiple indexes, and the query optimizer is not able to select the one you want. What indexes are there you can find using the db.collection.getIndexes(). And, you can generate a Query Plan and see how indexes are being applied using the explain() method on the query.

In general, I think using of hint is not encouraged. Having more data doesn’t allow selection of a particular index, I think. How index is selected for a query depends upon:

yannick · March 18, 2022, 4:00pm

It was always only using search_1, but…

I added another therapist with another patient called Müller. Now it uses owner_1_search_1 by default.

I guess, since there was only one document in therapists, the query optimizer did not see the necessity to use the compound index and just used search_1 which gave the same results.

system · March 23, 2022, 4:01pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.