Hi @Khusro_Siddiqui,
Thanks for sharing the requested details.
I tried with .saveAll(batch)
, bulkOps.insert(batch)
and mongoTemplate.insertAll()
. After running it multiple times, unfortunately, I cannot reproduce what you’re seeing.
Sharing all three code snippets for your reference.
public class Prr_data_records {
...
private Date recordStartDate;
...
private static final int BATCH_SIZE = 60000;
List<Prr_data_records> data = new ArrayList<>();
Random random = new Random();
for (int i = 0; i < 100_0000; i++) {
data.add(new Prr_data_records("John", "Doe", new Date(System.currentTimeMillis() + random.nextInt(1000 * 60 * 60 * 24))));
}
for (int i = 0; i < data.size(); i += BATCH_SIZE) {
int endIndex = Math.min(i + BATCH_SIZE, data.size());
List<Prr_data_records> batch = data.subList(i, endIndex);
Prr_data_recordsRepository.saveAll(batch);
}
...
public class MongodbexampleApplication {
private static final int BATCH_SIZE = 60000;
public static void main(String[] args) {
...
BulkOperations bulkOps = mongoTemplate.bulkOps(BulkMode.UNORDERED, Prr_data_records.class);
for (int i = 0; i < data.size(); i += BATCH_SIZE) {
int endIndex = Math.min(i + BATCH_SIZE, data.size());
List<Prr_data_records> batch = data.subList(i, endIndex);
bulkOps.insert(batch);
bulkOps.execute();
}
...
public class MongodbexampleApplication {
private static final int BATCH_SIZE = 60000;
public static void main(String[] args) {
...
for (int i = 0; i < data.size(); i += BATCH_SIZE) {
int endIndex = Math.min(i + BATCH_SIZE, data.size());
List<Prr_data_records> batch = data.subList(i, endIndex);
mongoTemplate.insertAll(batch);
}
...
Also, we tried with a different driver (Pymongo) to bulk insert in the time series collection and it worked for us without throwing any duplicate errors.
from pymongo import MongoClient, InsertOne
from datetime import datetime, timedelta
client = MongoClient("mongodb://localhost:27017/")
db = client["test"]
collection_name = "prr_data_records"
db.create_collection(name=collection_name, timeseries={"timeField": "recordStartDate", "metaField": "mdn", "granularity": "hours"})
collection = db[collection_name]
total_docs = 1000000
ops = []
d = datetime.now()
for i in range(total_docs):
record_start_date = d + timedelta(seconds=i+1)
ops.append(InsertOne({
"recordStartDate": record_start_date,
"lastModifiedDate" : d,
...
}))
collection.bulk_write(ops)
print("Data Inserted Successfully")
Can you please execute this code and see if it still throws the error?
Can you confirm if you notice the pattern - might be related to the load on the database at certain times, like during the day or week? Have you checked your server logs? That might give us some hints as to the exact cause.
Best,
Kushagra