Help with Cursor id not found in ruby

Sam_Pope · March 5, 2021, 7:04pm

Hi,

I have a ruby script which is looking up ip addresses from a collection (50m documents) and querying an API for data around the IP address. It then takes the API response, which is native JSON, and inserts it into another collection.

I use ruby queues to thread these two operations. The producer thread is taking each ip from the collection, looking it up in the target collection, if not exist, add ipaddr to the queue.

In ruby code, as follows:

    queue = Queue.new
    producer = []
    producer << Thread.new do
      unique_ipaddr.find.each do |ipaddr|
        if geoip_infoDB.count("ip" => ipaddr.to_h["_id"]) == 0
          queue.push(ipaddr.to_h["_id"])
        end # end if test
      end # end unique_ipaddr find
    end

The consumer thread waits until there is something in the queue, pulls the ipaddr from the queue, does the lookup from the API, and inserts the result into the destination collection. The API can handle 50 concurrent connections, but we limit it to 48 to be conservative. Like this:

consumer = []
48.times do 
  consumer << Thread.new do
    while qipaddr = queue.pop
      insert_to_db(get_geoip_results(qipaddr), geoip_infoDB, geoip_errorsDB)
    end # end while
  end # end new thread
end # end 48 times
consumer.each { |t| t.join }

The insert_to_db function is taking the response from the API, making sure it’s valid, and inserting the result:

def insert_to_db(response, geoip_infoDB, geoip_errorsDB)
  if response.code == 200 && !response.body.nil?
    geoip_infoDB.insert_one(response.parsed_response)
  else
    geoip_errorsDB.insert_one(response.parsed_response)
  end
end

Consistently at 45,000 documents on queue, we hit a cursor id not fond (43) error:

/var/lib/gems/2.7.0/gems/mongo-2.14.0/lib/mongo/operation/result.rb:343:in raise_operation_failure: cursor id 2074666401145332617 not found (43) (on databaseserver) (Mongo::Error::OperationFailure)

What am I doing wrong?

Thank you for reading this far!

alexbevi · March 8, 2021, 12:09pm

Hi @Sam_Pope,

With default options on the mongod, an idle cursor will time out after 10 minutes (see cursorTimeoutMillis).

When performing a read operation, if the cursor is open and not iterated for some time the server will time it out and a subsequent read attempt (via getMore) will result in the error you’re seeing.

Three options to consider:

Lower the batch_size to force more frequent getMore operations being sent to the server (see Ruby Driver Query Options)
Set the operation’s timeout to no_cursor_timeout (see Ruby Driver Query Options)
Redesign the logic

Option 3 may not seem the most helpful, but if there is a chance the producer logic can idle for long periods of time a new strategy would eliminate the likelihood of idle cursors being reaped.

Sam_Pope · March 8, 2021, 5:39pm

Thank you for the great response, @alexbevi. I spent some time this weekend debugging the issue. It appears the producer thread is the culprit. It’s from a large aggregation (50m docs) and during this read, it does timeout. The consumer is quick insert_one, up to 50m times.

I need to debug why a simple find.each is taking more than 10 minutes between reads.

I ended up contacting the API provider and asking for a data export, so I can avoid doing 50m lookups every day. This follows your third suggestion. Now I have a new problem of importing 5 crazy csvs of 50m rows, but that’s unrelated to this question.

Sam_Pope · March 10, 2021, 7:28pm

Here’s how I changed the producer thread:

queue = Queue.new
producer = []
producer << Thread.new do
  unique_ipaddr.find(:no_cursor_timeout => 1).each do |ipaddr|
    if geoip_infoDB.count("ip" => ipaddr.to_h["_id"]) == 0
      queue.push(ipaddr.to_h["_id"])
    end
  end
end

it seems the find(:no_cursor_timeout => 1).each has greatly slowed down the find.

It might actually be faster to great 2 Sets, 1 of unique_ipaddr and 1 of ipaddr in geoip_infoDB, and just use ruby to (unique_ipaddr - geoip_infoDB) = to_do_set and just lookup the to_do_set.

system · March 15, 2021, 7:28pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.