What's the most performant way to get multiple documents by an indexed field?

Hi, I’m using the .NET driver and I’m curious about how to make the best use of indexed fields. If I want to get a single document by its index, it’s a piece of cake and super fast:

Builders<[MY CLASS]>.Filter.Eq(x => x.Id, [ID]);

But what if I have a collection of potentially many IDs and want the filter to get all those documents? I’ve tried using in, but the slow performance makes me think that Mongo is just searching all documents rather than using the index:

Builders<[MY CLASS]>.Filter.In(x => x.Id, [MY COLLECTION]);

I’ve also tried oring multiple eqs. This seems like it might be faster, at least when I try it with a reasonably small number of IDs. But when I put in hundreds or thousands of IDs, the code crashes when I try to iterate the IAsyncEnumerable that is produced:

var builder = Builders<[MY CLASS]>.Filter;
builder.Or(builder.Eq(x => x.Id, [ID 1]), builder.Eq(x => x.Id, [ID 2]), builder.Eq(x => x.Id, [ID 3]));

The last option that comes to my mind would be to make multiple calls to the database for each ID and handle everything in code. I can’t imagine this would be a very performant option, especially as I’m applying other filtering rules as well as sorting / skipping / limiting. Is there another option I’m missing to tackle this in a Mongo filter?

That’s a good question. I will look into and see what I can find.

I would expect the most performant way would be to use In to query for a number of documents by _id at the same time. In should take advantage of the index, but it would still have to fetch the contents of the documents one at a time (from the location pointed to by the index). Depending on the number of _id’s relative to the size of the collection it could even decide that a full collection scan is faster.

There would be a limit to how many id’s you could query for at the same time. There is a finite limit on how big a query document can be, and the more _id’s you query for at the same time the larger the query document becomes.

You say that “the code crashes” when you try to enumerate the results. Can you provide more details, such as what exception message you are seeing and how to reproduce?

In the meantime I will begin my own attempts to reproduce this and measure performance.

Thanks for the response. Assuming “ids” is a collection of strings, this is how I built my filter in C#:

Builders<[MY CLASS]>.Filter.Or(ids.Select(id => filterBuilder.Eq(r => r.Id, id)))

When I run ToString on my IFindFluent, I get what I believe to be the final Mongo query:

find({ \"$or\" : [{ \"_id\" : ObjectId(\"123\") }, { \"_id\" : ObjectId(\"456\") }, { \"_id\" : ObjectId(\"789\") }] }).sort({ \"_id\" : 1 })

(But with real object IDs and a whole lot more of them.)

I then call ToAsyncEnumerable and try to iterate the enumerable:

var enumerable = _collection.Find(...).ToAsyncEnumerable(cancellationToken);
await foreach (T item in enumerable.WithCancellation(cancellationToken).ConfigureAwait(false))
{
	// do work...
}

Everything works when I use 500 IDs, but 1000 IDs results in this exception being throw (simplified IDs like before):

{"Command find failed: Error=2, Details='Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a; Reason: (Message:   errors  \u0005\u0001 severity Error location \u0010 start  \u0004 end  \u0004 code SC1030 message~ Tt\u0019\u0014   yPz\u000e  ߠ \rg    \u001b$.     <   .\u0010\u0015    sP =    u  \a  e\u0010 \r\n  ey\u001eD.   < ]   d\u0010 -\a  m8  O  x  <   nP \r  ˠx ,ϻ@÷{ &     \u001c^  g\u0010\u001d]\u0006  ey\u001e v ߠv MO  eм\u001df  rP ^   s\u0016 -\a  m8;mΧ g\u0010\u001d]\u0006  m8  \a  pyy>O  .\r\nActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a, Request URI: /apps/DocDbApp/services/DocDbServer10/partitions/a4cb4956-38c8-11e6-8106-8cdcd42c33be/replicas/1p/
, RequestStats: Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, SDK: Windows/10.0.22000 cosmos-netstandard-sdk/3.18.0);););."}
    Code: 2
    CodeName: "BadValue"
    Command: {{ "find" : "items", "filter" : { "$or" : [{ "_id" : ObjectId("123") }, { "_id" : ObjectId("456") }, { "_id" : ObjectId("789") }] }, "sort" : { "_id" : 1 }, "skip" : 0, "limit" : 500 }}
    ConnectionId: {{ ServerId : { ClusterId : 1, EndPoint : "Unspecified/localhost:10255" }, LocalValue : 3, ServerValue : "663943947" }}
    Data: {System.Collections.ListDictionaryInternal}
    ErrorLabels: Count = 0
    ErrorMessage: "Error=2, Details='Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a; Reason: (Message:   errors  \u0005\u0001 severity Error location \u0010 start  \u0004 end  \u0004 code SC1030 message~ Tt\u0019\u0014   yPz\u000e  ߠ \rg    \u001b$.     <   .\u0010\u0015    sP =    u  \a  e\u0010 \r\n  ey\u001eD.   < ]   d\u0010 -\a  m8  O  x  <   nP \r  ˠx ,ϻ@÷{ &     \u001c^  g\u0010\u001d]\u0006  ey\u001e v ߠv MO  eм\u001df  rP ^   s\u0016 -\a  m8;mΧ g\u0010\u001d]\u0006  m8  \a  pyy>O  .\r\nActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a, Request URI: /apps/DocDbApp/services/DocDbServer10/partitions/a4cb4956-38c8-11e6-8106-8cdcd42c33be/replicas/1p/, Re
questStats: Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, SDK: Windows/10.0.22000 cosmos-netstandard-sdk/3.18.0);););"
    HResult: -2146233088
    HelpLink: null
    InnerException: null
    Message: "Command find failed: Error=2, Details='Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a; Reason: (Message:   errors  \u0005\u0001 severity Error location \u0010 start  \u0004 end  \u0004 code SC1030 message~ Tt\u0019\u0014   yPz\u000e  ߠ \rg    \u001b$.     <   .\u0010\u0015    sP =    u  \a  e\u0010 \r\n  ey\u001eD.   < ]   d\u0010 -\a  m8  O  x  <   nP \r  ˠx ,ϻ@÷{ &     \u001c^  g\u0010\u001d]\u0006  ey\u001e v ߠv MO  eм\u001df  rP ^   s\u0016 -\a  m8;mΧ g\u0010\u001d]\u0006  m8  \a  pyy>O  .\r\nActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a, Request URI: /apps/DocDbApp/services/DocDbServer10/partitions/a4cb4956-38c8-11e6-8106-8cdcd42c33be/
replicas/1p/, RequestStats: Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, SDK: Windows/10.0.22000 cosmos-netstandard-sdk/3.18.0);););."
    Result: {{ "ok" : 0.0, "errmsg" : "Error=2, Details='Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a; Reason: (Response status code does not indicate success: BadRequest (400); Substatus: 0; ActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a; Reason: (Message:   errors  \u0005\u0001 severity Error location \u0010 start  \u0004 end  \u0004 code SC1030 message~ Tt\u0019\u0014   yPz\u000e  ߠ \rg    \u001b$.     <   .\u0010\u0015    sP =    u  \u0007  e\u0010 \r\n  ey\u001eD.   < ]   d\u0010 -\u0007  m8  O  x  <   nP \r  \u02e0x ,ϻ@÷{ &     \u001c^  g\u0010\u001d]\u0006  ey\u001e v ߠv MO  eм\u001df  rP ^   s\u0016 -\u0007  m8;mΧ g\u0010\u001d]\u0006  m8  \u0007  pyy>O  .\r\nActivityId: 86a080ef-3cd2-426c-bdbc-a674979c674a, Request URI: /apps/DocDbApp/services/DocDbServer10/partitions/a4cb4956-38c
8-11e6-8106-8cdcd42c33be/replicas/1p/, RequestStats: Microsoft.Azure.Cosmos.Tracing.TraceData.ClientSideRequestStatisticsTraceDatum, SDK: Windows/10.0.22000 cosmos-netstandard-sdk/3.18.0);););", "code" : 2, "codeName" : "BadValue" }}
    Source: "MongoDB.Driver.Core"
    StackTrace: "   at MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1.ProcessResponse(ConnectionId connectionId, CommandMessage responseMessage)\r\n   at MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1.<ExecuteAsync>d__20.MoveNext()\r\n   at MongoDB.Driver.Core.Servers.Server.ServerChannel.<ExecuteProtocolAsync>d__20`1.MoveNext()\r\n   at MongoDB.Driver.Core.Operations.RetryableReadOperationExecutor.<ExecuteAsync>d__3`1.MoveNext()\r\n   at MongoDB.Driver.Core.Operations.ReadCommandOperation`1.<ExecuteAsync>d__8.MoveNext()\r\n   at MongoDB.Driver.Core.Operations.FindOperation`1.<ExecuteAsync>d__129.MoveNext()\r\n   at MongoDB.Driver.Core.Operations.FindOperation`1.<ExecuteAsync>d__128.MoveNext()\r\n   at MongoDB.Driver.OperationExecutor.<ExecuteReadOperationAsync>d__3`1.MoveNext()\r\n   at MongoDB.Driver.MongoCollectionImpl`1.<ExecuteReadOperationAsync>d__99`1.MoveNext()\r\n   at MongoDB.Driver.MongoCollectionImpl`1.<UsingImplicitSessionAsync>d__107`1.MoveNex
t()\r\n   at DataCollection.Data.Extensions.IAsyncCursorSourceExtensions.<ToAsyncEnumerable>d__0`1.MoveNext() in C:\\Source\\sawtooth\\DataCollection\\src\\DataCollection.Data\\Extensions\\IAsyncCursorSourceExtensions.cs:line 12\r\n   at DataCollection.Data.Extensions.IAsyncCursorSourceExtensions.<ToAsyncEnumerable>d__0`1.System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult(Int16 token)\r\n   at System.Threading.Tasks.ValueTask`1.get_Result()\r\n   at System.Runtime.CompilerServices.ConfiguredValueTaskAwaitable`1.ConfiguredValueTaskAwaiter.GetResult()\r\n   at DataCollection.Business.Helpers.GetRespondentsHelper.<ToListAsync>d__20`1.MoveNext() in C:\\Source\\sawtooth\\DataCollection\\src\\DataCollection.Business\\Helpers\\GetRespondentsHelper.cs:line 213\r\n   at DataCollection.Business.Helpers.GetRespondentsHelper.<ToListAsync>d__20`1.MoveNext() in C:\\Source\\sawtooth\\DataCollection\\src\\DataCollection.Business\\Helpers\\GetRespondentsHelper.cs:line 213"
    TargetSite: {TCommandResult ProcessResponse(MongoDB.Driver.Core.Connections.ConnectionId, MongoDB.Driver.Core.WireProtocol.Messages.CommandMessage)}

I assumed such a cryptic error message wouldn’t be of any help, but perhaps it means something to you.

Welcome to the MongoDB Community @Zachary_Anderson1 !

Cosmos DB’s emulation of MongoDB is an entirely independent server implementation of a subset of MongoDB features for the claimed MongoDB API version. The error message you are encountering is specific to Cosmos DB.

I would try running explain("executionStats") to see if there is any insight on the plan execution for your query, but since the backend isn’t MongoDB you will need different expertise to understand how to improve performance.

For more insight on Cosmos’ indexing behaviour, I suggest looking into Stack Overflow (Newest 'azure-cosmosdb+indexing' Questions - Stack Overflow) and the Cosmos DB Indexing documentaiton.

Regards,
Stennie

I tested your scenario against MongoDB Version 6.0.1 and using In is performant with any number of _ids. In fact it is more performant (in terms of documents/second) the more_ids are involved.

I got the following results:

1 _id : 28 ms (35/sec)
10 _ids : 0 ms (17,633/sec)
100 _ids : 1 ms (84,160/sec)
500 _ids : 3 ms (139,481/sec)
1000 _ids : 9 ms (110,989/sec)
100,000 _ids : 724 ms (138,013/sec)

I attribute the 28 ms for 1 _id to some sort of warming up of the server (e.g. loading the index into memory).

Reading 10 _ids didn’t actually take 0 seconds. It’s a rounding error (it took close to 0).

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.