Atlas Search with nextPageToken returns duplicated documents

Ron_Tsabary · September 16, 2024, 3:52pm

Hello, I have an issue with implementing paginations with nextPageToken feature by Atlas Search.

As part of querying our data using the search capabilities, I’ve implemented an ‘enrichment’ for nextPageToken of the last item returned from the searchQuery.

The issue is the following:
Say I have 6 documents ([d1, d2, d3, d4, d5, d6]) indexed and my pageSize is 2

assuming that the returned nextPageToken values are = [n1, n2, n3, n4, n5, n6] respectively.

First query for the first 2 documents ([d1, d2]) will return a next page token n2.
In some cases, querying for the same query by adding the received next page token with a searchAfter operator, will result with next result returned with [d2, d3] meaning that d2 was returned twice.

We had a similar issue using BigQuery, that was caused due to using invalid nextPageToken (actually in BigQuery they invalidate the token and require to start from the beginning).

I wonder if that is the case.

Might there be that in between the queries another documented is entered to the AtlasSearch index and a new order is defined ? say creating a [d7, d1, d2, d3, d4, d5, d6]? which causes a shift in the document position and by that return an already retrieved document again ?

If that is the case, does it have any issue opened and you guys are planning to work on it ?

If this is not the case, I would love for some assistance to understand what is the issue and if we can fix it on our side.

Best regards and thank you in advance for your assistance

amyjian · September 16, 2024, 4:27pm

Hi @Ron_Tsabary , welcome to the MongoDB Community! Can you share the exact query you are using? As mentioned in the docs, it’s important to use a unique field to sort your query results.

Also, do you happen to be using Search Nodes?

Ron_Tsabary · September 19, 2024, 10:43pm

Ahh, thanks for the link to the docs, missed it when built the system.

I am using a SearchNodes (2) deployed in our system…
So as I understand it, there is always a chance to have a duplicated document because that the IDs between the 2 Lucene instance on the search nodes are different which causes the searchAfter which the nextPageToken to be inconsistent…

Is there any open issue on that or is it going to be left as is ?
The exclude IDs is what we are currently doing… (we do it on the client’s side… so we won’t need to send thousands of _id s to exclude in MongoDB aggregation when paginated to a large page)

amyjian · September 20, 2024, 1:15pm

This duplicated document issue on Search Nodes will be resolved in our next release, which should be available in all clusters in the upcoming weeks.

Ron_Tsabary · September 29, 2024, 6:56am

Just returned from some personal time off

Thanks for your assistance @amyjian
I will follow the next months releases