$sort & Atlas Search

Hello! I have been building an application centered around atlas search (trying to avoid using elastic). Everything is great until sorting. It seems the recommended approach is to use stored source fields (docs have an example on sorting) but this is extremely slow on large datasets. Sorting with near on dates, and number fields is lightning fast. But if you need to sort on text it is incredibly slow. I hope I am missing something. Any help is appreciated.

Hey Kyle! Stored Source is particularly optimized to make sorting on text for larger datasets faster. Do you mind sharing

  1. a sample document
  2. your index definition
  3. your query
    How many fields are you hoping to sort on at a given time?

Hi @Elle_Shwer , Thank you for responding to me :slightly_smiling_face: I unfortunately cant share the document or index definition as the client is EXTREMELY protective of their data. What I can share is that the documents have a lot of fields, all of which are indexed (not ideal I know) but their app needs to be able to search on any of these fields.

Once the data is loaded into atlas search no more writes occur on that collection. just reads. The query consists of a compound filter / must stage and returns one stored source field per query and we just want to sort on the one field.

When doing the near operator on all the other numeric or date fields for sorting its almost instant. As well as any other search queries return instantly. However the sort on the m10 cluster will take 40 seconds or time out. I figured it was the weak m10 cluster so I upgraded to an m40 to run some tests and it got a lot faster but still takes 10 - 15 seconds. The collection size is around 286k documents and the index size is around 600 mb.

Hi Kyle, no problem. Based on some research, what you’re experiencing seems possible (hard to say without details). Some options to improve performance:

  • you likely have a large recall set, any chance you can use more precise query to match less documents?
  • any chance you can lower the size of the stored source field(s)?

Please note that we are working on significant improvements to minimize the effect of recall set/stored source size, so hopefully we make headway on this naturally being faster for you soon!

Hi @Elle_Shwer the recall set is large, but there is no way to reduce that with the application requirements as the feature is a table, and the search can be just a few characters that still return a large portion of the dataset. The sort has to occur before the limit stage for pagination.

Currently I am storing just one source field, that adds around 50 mb to the index from what I have seen the field is usually a max of like 10 characters so its not big. So it must just be the amount of documents thats causing the issue. The hard part is elastic seems to handle things fine. And as I said before sorting with near on non string fields is really fast. So its a weird experience for the user when they sort on a string and everything slows to a crawl.

Thats great to hear that improvements are being made. I really appreciate your input and help

edit: sorry just realized I was logged in with my work mongo account

@Elle_Shwer any other ideas on this? I really want to use mongo search instead of elastic.

Would it make sense to discuss this further on a call? (If you click on my name and send me a dm with your email, I’ll send over my calendly.)

@Elle_Shwer sure, for some reason I am not seeing the DM option though.