pymongo.errors.DocumentTooLarge: BSON document too large (50853059 bytes) - the connected server supports BSON document sizes up to 16777216 bytes

Monika_Shah · March 13, 2023, 1:38pm

I have test database of documents with average document size is 32942 Bytes only.

I am getting exceed memory error on below mentioned very simple query and small database (@ 600MB).
Code:
resultSet=collection.find({“cs.c_id”:{“$in”: list}})
reccnt=0
for rec in resultSet:
reccnt=reccnt+1
Error: pymongo.errors.DocumentTooLarge: BSON document too large (50853059 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.

Can anyone help me to resolve it?

steevej · March 13, 2023, 3:00pm

In principal, only small enough documents are stored.

But a query is also a JSON document. May be your query is simply too big. What is the size of your variable list? How many ids do you have in it?

Monika_Shah · March 14, 2023, 3:24am

Yes, you are right. Query parameter list may be high (approximate 10^6) .

Thank you to help in cause identification.
Actually, it was result to avoid expensive Join operation using $lookup.

It is in two part:
part 1) identified all references from documents satisfying critieria,
These criteria are merged, then
part 2) Identify documents of ID return by part 1

What is solution for such case?

Monika_Shah · March 14, 2023, 3:31am

You are right. Query document may become large by large list.
But, It shows error for line ‘for rec in resultSet’ not for collection.find

steevej · March 14, 2023, 2:11pm

So to avoid a $lookup entirely done with one access to the server by

1 - doing a find that downloads the list of ids to lookup, join or find
2 - doing a second access to the server by uploading the list of ids you got in step 1 to find the documents

So you basically implement your own $lookup in a less efficient way using more access to the server, using more I/O between the client and the server and more CPU on the client which in principal is less powerful than the server.

$lookup

I do not know how accurate your python environment is terms of showing where is the error line but I suspect it is wrong. As far as I know, resultSet is a cursor so I am pretty confident that pymongo will return a valid cursor object. And each record is a document stored in the server, so I don’t see how any single rec from resultSet could be too big.

Monika_Shah · March 14, 2023, 6:04pm

I think $lookup would be less efficient. You may correct it.

Query here is to filter record from first collection. Collect references from Array from these documents. For this reference ID’s , search documents from second collection.

Two collections are used here, which may be have different distribution on shards.
So, there will be much network I/O between shards to perform $lookup stage.

On other side, two query are used to perform this task. First query is to find reference IDs from first collection by querying it. This process will be done in parallel to all applicable shards. Almost balanced workload . Only IDs are returned from first query. It will be used to filter second collection in balanced workload . No document transfer among cluster nodes will be required except result.

Shane · March 14, 2023, 8:18pm

PyMongo’s find() cursor is executed lazily on the first iteration, this is why the exception is raised on the “for rec in resultSet” line.

steevej · March 15, 2023, 12:35am

I could by I will not because I do not personally have the resources to test it and I will not use the resources of my customers to test it.

To test if the BSON document too large error is the size of the query, like I think, versus the processing of the result set what you can do is to try insert the query (using the same list) into a temporary collection rather than calling find.

So try

insert_one_result = temp_collection.insert_one( { "cs.c_id" : { "$in" : list } } )

If you get a DocumentTooLarge error then you will know that the query is too big. If it works with exactly the same list that generated the error with find then I have no clue.

Monika_Shah · March 15, 2023, 3:46am

Yes, query document is large. Thank you.