Duplicate Data Issue

Hello,

Well it is nothing really special. I’ve been testing it on a jupyter notebook using pymongo Version: 4.0

At first, I let mongodb to handle _id field thinking it could handle duplicate data with the same metadata and timestamp. and I think this is really important for a timeseries database. It should not let insert if the metadata and the timestamp are the same. we shouldnt be dealing with generating _id field for this purpose…

using insertMany, inserted couple of hundred documents. Below is one of the documents;

{
    "timestamp": {
        "$date": "2021-12-01T18:58:00.000Z"
    },
    "metadata": {
        "chart": "candles",
        "interval": "1min",
        "market": "USD",
        "symbol": "REQ"
    },
    "open": 0.7027,
    "low": 0.7015,
    "high": 0.7083,
    "close": 0.7074,
    "volume": 163137
}

then noticed it does not handle duplicate data so I decided to generate _id field on my application side. _id generated using metadata field and timestamp as below;

{
    "timestamp": {
        "$date": "2021-12-01T18:58:00.000Z"
    },
    "metadata": {
        "chart": "candles",
        "interval": "1min",
        "market": "USD",
        "symbol": "REQ"
    },
    "open": 0.7027,
    "_id": "REQ-USD-1min-candles-1638385080",
    "low": 0.7015,
    "high": 0.7083,
    "close": 0.7074,
    "volume": 163137
}

but still, duplicates were allowed… here is a screenshot