$$CLUSTER_TIME vs $currentDate in case of parallel upserts

Since Mongodb 4.2 there is opportunity to use updates with aggregation pipeline. In particular, it became possible to use system variables: $$NOW and $$CLUSTER_TIME. So now you have 2 ways of setting current timestamp according to documentation. Using operator $currentDate or variable $$CLUSTER_TIME

But for some reason these two methods works diffirently in some occasions, which is not clear from the docs at all.

For example, I want to store some entities with timestamp field, which has to be unique across the entire collection.

public class SomeDTO
{
    [BsonId]
    public Guid Id { get; set; }

    [BsonElement("timestamp")]
    public BsonTimestamp Timestamp { get; set; }
}

Now, let’s write a lot of objects concurrently

[Test]
public void Set_Unique_Timestamp_On_UpsertOne_With_ClusterTime_SysVar()
{
    // creating test.test collection with unique timestamp index
    var mongoClient = LifetimeScope.Resolve<MongoClientProvider>().Get();
    var collection = mongoClient.GetDatabase("test").GetCollection<SomeDTO>("test");
    collection.Indexes.CreateOne(
        new CreateIndexModel<SomeDTO>(
            Builders<SomeDTO>.IndexKeys.Ascending(x => x.Timestamp),
            new CreateIndexOptions { Unique = true }));

    // upsert operation
    async Task UpsertOneAsync(Guid id)
    {
        var pipeline = new[] { new BsonDocument("$set", new BsonDocument("timestamp", "$$CLUSTER_TIME")) }; // <---- causing  troubles

        await collection.UpdateOneAsync(
                Builders<SomeDTO>.Filter.Eq(x => x.Id, id),
                Builders<SomeDTO>.Update.Pipeline(pipeline),
                new UpdateOptions { IsUpsert = true })
            .ConfigureAwait(false);
    }

    // running tasks in parallel
    var tasks = Enumerable.Range(0, 150)
        .Select(_ => UpsertOneAsync(Guid.NewGuid()));

    Assert.DoesNotThrowAsync(() => Task.WhenAll(tasks));

}

Test failed with the error below

A write operation resulted in an error. WriteError: { Category : "DuplicateKey", Code : 11000, Message : "E11000 duplicate key error collection: test.test index: timestamp_1 dup key: { timestamp: Timestamp(1666297782, 58) }"

But if I implement of UpsertOne method with $currentDate operator

async Task UpsertOneAsync(Guid id)
{
    await collection.UpdateOneAsync(
            Builders<SomeDTO>.Filter.Eq(x => x.Id, id),
            Builders<SomeDTO>.Update.CurrentDate(x => x.Timestamp, UpdateDefinitionCurrentDateType.Timestamp),
            new UpdateOptions { IsUpsert = true })
        .ConfigureAwait(false);
}

test will pass.

Why $$CLUSTER_TIME sys variable returns equal values in case of parallel upserts? And why $currentDate don’t?

Hi @hexlify welcome to the community!

I am assuming you return $currentDate as date type instead of timestamp (which makes it equivalent to $$NOW). But in short, it’s because they return different things. From the system variable page:

  • NOW A variable that returns the current datetime value.
  • CLUSTER_TIME A variable that returns the current timestamp value.

And this is what timestamp is: https://www.mongodb.com/docs/manual/reference/bson-types/#timestamps

So if I have an aggregation to see the values of those two variables and running that aggregation in quick-ish succession, it prints:

replset [direct: primary] test> db.aggregate([ {$documents:[{now:'$$NOW', cluster_time:'$$CLUSTER_TIME'}]} ])
[
  {
    now: ISODate("2022-10-26T03:07:05.889Z"),
    cluster_time: Timestamp({ t: 1666753622, i: 1 })
  }
]

replset [direct: primary] test> db.aggregate([ {$documents:[{now:'$$NOW', cluster_time:'$$CLUSTER_TIME'}]} ])
[
  {
    now: ISODate("2022-10-26T03:07:06.194Z"),
    cluster_time: Timestamp({ t: 1666753622, i: 1 })
  }
]

Note that now looks different, but cluster_time looks identical between the two runs. This is because the i value in cluster_time is “operation counter in that second”, so it’s feasible that since you’re running the test in parallel, the cluster hasn’t done any operation and thus the script can create identical $$CLUSTER_TIME values. In contrast, you’ll very much less likely to create two identical $$NOW/$currentDate values since they have a much smaller resolution (milliseconds instead of seconds).

Having said that, since timestamp is an internal MongoDB variable, it’s usually best to use $$NOW or $currentDate instead for practical purposes.

Best regards
Kevin

1 Like

Thanks for your response.

I am assuming you return $currentDate as date type instead of timestamp (which makes it equivalent to $NOW )

I guess it’s my fault that I didn’t make it clear that I compare CLUSTER_TIME sysvar and $currentDate operator with timestamp type (function defined in last snippet uses UpdateDefinitionCurrentDateType.Timestamp constant).

This is because the i value in cluster_time is “operation counter in that second”, so it’s feasible that since you’re running the test in parallel, the cluster hasn’t done any operation and thus the script can create identical $$CLUSTER_TIME values.

It’s interesting because I thought that timstamp values are always unique. But as the documentation says it’s only true for one mongod instance. And I have replica set of 2 nodes locally. Probably requests go to random node every time and therefore timestamps are identical.

But the question remains: why the test with parallel upserts works perfectly with $currentDate { $type: "timestamp" }?

This is difficult to determine without running the tests themselves and work back on the exact reason. However I think there’s a high possiblity this can occur anyway, since a timestamp’s resolution is per-second. The incrementing ordinal totally depends on the number of operations performed by the server in that second, so a collision is quite likely to occur in a highly parallel workload.

However I would recommend your app to not use timestamp datatype since:

  1. It’s marked as internal, and may unexpectedly change in the future
  2. Your tests showed the possibility of identical timestamps getting generated, so it won’t fit your use case anyway
  3. The $$NOW to me looks like a perfectly valid alternative, with much greater resolution in the millisecond range, and it doesn’t depend on the server’s workload to generate the value.

Hopefully this helps!

Best regards
Kevin

1 Like