MongoDB Timeseries BulkWriteError code=11000

Khusro_Siddiqui · February 2, 2023, 9:32pm

Hi All,

I am using MongoDB 6 Time Series collections and using Spring to write data in to the collection. Doing a Bulk Write using the saveAll() method sometimes throws this error:

Error: Bulk write operation error on server ssssss.mongodb.net:27017. Write errors: [BulkWriteError{index=8990, code=11000, message='E11000 duplicate key error collection: usmobile_analytics_test.system.buckets.prr_data_records dup key: { _id: ObjectId('63d0b750abbdf51035943f6c') }', details={}}]. ; nested exception is com.mongodb.MongoBulkWriteException: Bulk write operation error on server cluster0-shard-00-01.iyvqq.mongodb.net:27017. Write errors: [BulkWriteError{index=8990, code=11000, message='E11000 duplicate key error collection: usmobile_analytics_test.system.buckets.prr_data_records dup key: { _id: ObjectId('63d0b750abbdf51035943f6c') }', details={}}].

From what I have read, time series collections does not create an index on the _id field and we cannot do updates. The why would this error happen?
Sample code below:

private void writeToDataRecordsCollection(List<DataRecord> dataRecords, String filename) {

        log.info("Writing Data Records to collection for file {}.", filename);
        dataRecordsRepository.saveAll(dataRecords);
    }

Kushagra_Kesav · February 3, 2023, 4:27am

Hi @Khusro_Siddiqui,

Welcome to the MongoDB Community forums

Yes, in a time-series collection, the uniqueness of the _id field is not enforced, and you shouldn’t be seeing this duplicate key error.

As a quick first check, timeseries collection was introduced in MongoDB 5.0 and requires a createCollection command to create one. Could you provide your MongoDB version and how did you create the timeseries collection?

Also, could you please share the following:

The sample document of your collections you are trying to insert.
A small snippet of your code that can help us reproduce the issue.

Furthermore, can you confirm the following:

That the collection you are writing in is a time series collection.
Do you have the same error when you try using mongosh to do the inserts and not your code?

Best,
Kushagra

Khusro_Siddiqui · February 3, 2023, 4:46am

Hello @Kushagra_Kesav. I am using MongoDB Version 6.0.4 hosted in Atlas.

The collection is a time series collection and this is how I have created it

db.createCollection(
    "prr_data_records",
    {
       timeseries: {
          timeField: "recordStartDate",
          metaField: "mdn",
          granularity: "hours"
       },
       expireAfterSeconds: 7776000
    }
);

I am seeing the error when I do bulk inserts around 50000-60000 records. But the error is very random and not consistent at all. Very difficult to reproduce it again.

If you look at the error, it says error while inserting into the system.buckets.prr_data_records. Isn’t this the internal collection that mongo uses to store TS data?

Sample record:

{
    "recordStartDate" : ISODate("2022-12-08T05:00:00.000+0000"),
    "mdn" : "**********",
    "imsi" : "**********",
    "secondInEst" : NumberInt(37),
    "lastModifiedDate" : ISODate("2023-02-03T03:01:48.059+0000"),
    "hourInEst" : NumberInt(20),
    "totalKbUnits" : "2375.7148437500",
    "endDateTime" : ISODate("2022-12-09T03:42:41.000+0000"),
    "monthInEst" : NumberInt(12),
    "minuteInEst" : NumberInt(1),
    "idempotentKey" : "1516771233047_2022342",
    "yearInEst" : NumberInt(2022),
    "_id" : ObjectId("63dc7926a8ba1b3326799703"),
    "dayInEst" : NumberInt(8),
    "countryCode" : "USA"
}

Khusro_Siddiqui · February 3, 2023, 3:28pm

Encountered the issue again today

Error: Bulk write operation error on server sssssss.mongodb.net:27017. Write errors: [BulkWriteError{index=1528, code=11000, message='E11000 duplicate key error collection: usmobile_analytics_test.system.buckets.prr_data_records dup key: { _id: ObjectId('63d1c280aadc99d3ff3d8189') }', details={}}]. ; nested exception is com.mongodb.MongoBulkWriteException: Bulk write operation error on server ssssss.mongodb.net:27017. Write errors: [BulkWriteError{index=1528, code=11000, message='E11000 duplicate key error collection: usmobile_analytics_test.system.buckets.prr_data_records dup key: { _id: ObjectId('63d1c280aadc99d3ff3d8189') }', details={}}].

Does mongodb not support bulk write operations well enough when using Spring?

Kushagra_Kesav · February 8, 2023, 7:22am

Hi @Khusro_Siddiqui,

Thanks for sharing the requested details.

I tried with .saveAll(batch), bulkOps.insert(batch) and mongoTemplate.insertAll(). After running it multiple times, unfortunately, I cannot reproduce what you’re seeing.

Sharing all three code snippets for your reference.

public class Prr_data_records {
...
	private Date recordStartDate;
        ...
	        private static final int BATCH_SIZE = 60000;
			List<Prr_data_records> data = new ArrayList<>();
			Random random = new Random();
			for (int i = 0; i < 100_0000; i++) {
				data.add(new Prr_data_records("John", "Doe", new Date(System.currentTimeMillis() + random.nextInt(1000 * 60 * 60 * 24))));
			}

			for (int i = 0; i < data.size(); i += BATCH_SIZE) {
				int endIndex = Math.min(i + BATCH_SIZE, data.size());
				List<Prr_data_records> batch = data.subList(i, endIndex);

				Prr_data_recordsRepository.saveAll(batch);
			}
            ...

public class MongodbexampleApplication {
	private static final int BATCH_SIZE = 60000;

	public static void main(String[] args) {
            ...
			BulkOperations bulkOps = mongoTemplate.bulkOps(BulkMode.UNORDERED, Prr_data_records.class);
			for (int i = 0; i < data.size(); i += BATCH_SIZE) {
				int endIndex = Math.min(i + BATCH_SIZE, data.size());
				List<Prr_data_records> batch = data.subList(i, endIndex);

				bulkOps.insert(batch);
				bulkOps.execute();
			}
	         	...

public class MongodbexampleApplication {
	private static final int BATCH_SIZE = 60000;

	public static void main(String[] args) {
          ...
			for (int i = 0; i < data.size(); i += BATCH_SIZE) {
				int endIndex = Math.min(i + BATCH_SIZE, data.size());
				List<Prr_data_records> batch = data.subList(i, endIndex);
				mongoTemplate.insertAll(batch);
			}
	         	...

Also, we tried with a different driver (Pymongo) to bulk insert in the time series collection and it worked for us without throwing any duplicate errors.

from pymongo import MongoClient, InsertOne
from datetime import datetime, timedelta

client = MongoClient("mongodb://localhost:27017/")
db = client["test"]
collection_name = "prr_data_records"

db.create_collection(name=collection_name, timeseries={"timeField": "recordStartDate", "metaField": "mdn", "granularity": "hours"})

collection = db[collection_name]
total_docs = 1000000
ops = []
d = datetime.now()
for i in range(total_docs):
    record_start_date = d + timedelta(seconds=i+1)
    ops.append(InsertOne({
        "recordStartDate": record_start_date,
        "lastModifiedDate" : d,
         ...
    }))

collection.bulk_write(ops)
print("Data Inserted Successfully")

Can you please execute this code and see if it still throws the error?

Can you confirm if you notice the pattern - might be related to the load on the database at certain times, like during the day or week? Have you checked your server logs? That might give us some hints as to the exact cause.

Best,
Kushagra

Khusro_Siddiqui · February 10, 2023, 3:07pm

@Kushagra_Kesav running all the operations only once does not encounter the issue. We process incoming files continuously. Lets say over the span of an hour, if we are processing 100 files, with each file containing around 200,000 records, we see the issue happen for 2-3 files.

I am pretty sure you will see the error if you run the above code snippets in a loop continuously for 1-2 hours