Ensuring composite _id attributes order

Madhav_kumar_Jha · September 6, 2021, 2:06pm

As mongo db documents points out, order of attribute in document _id matters .

Example ::
“_id” : {
“code”: “x”,
"“ssn” : “y”
}

is not same as :
“_id” : {
"“ssn” : “y”,
“code”: “x”
}

When multiple application(managed by different developer, each of them might employ different library or order different in pojo class) writes to database, there could be chances of deviation in order of attributes.

What should be the best way to enforce order of attributes , especially while using mongo java driver.

MaBeuLux88_xxx · September 6, 2021, 11:14pm

Hi @Madhav_kumar_Jha and welcome in the MongoDB Community !

Technically speaking, you are right. These 2 documents are technically different and you could consider them different in your code but it’s not the recommended practice.

As far as I know, every JSON parser (Jackson, …) will consider these 2 documents identical (or at least there is an option to relax that behaviour) and you will be able to map correctly your MongoDB docs on your POJOs. The order of the fields shouldn’t make a difference for you and you should consider these documents identical to avoid problems between the different data providers and consumers.

Here is my blog post about Java & POJOs Mapping.

Cheers,
Maxime.

Madhav_kumar_Jha · September 7, 2021, 3:13am

The problem i am facing is that the mongo driver methods to find/delete/insert is not order insensitive when it comes to _id, while they are same for me as java objects.

MaBeuLux88_xxx · September 7, 2021, 1:10pm

Find, delete and insert do not care about the order of the fields in your document in the database.
Can you provide a minimalist piece of code that reproduces your problem?

Here is my “find” method in my Java Spring Boot MongoDB Starter project on Github if that helps.

Madhav_kumar_Jha · September 7, 2021, 1:37pm

Data
NoArgsConstructor
AllArgsConstructor
public class Employee implements Serializable{

private EmployeeKey id;

private String city;

}

Data
NoArgsConstructor
AllArgsConstructor
public class EmployeeKey implements Serializable{

private String branchNumber;

private Long baId;

}

List empIds = new ArrayList<>();

EmployeeKey ek1 = new EmployeeKey(“London123”, 100L);

empIds.add(ek1);

client.getCollection(“Employee”).deleteMany(empIds);

Above snippet is unable to delete/(find/does duplicate if inserted), the document with below _id :

“_id”: {
“branchNumber” : “London123”,
“baId”:{
“$numberLong”:“100”
}
}

If i reverse order of attributes in _id in mongo, it gets deleted.

Note :: I am trying to delete existing data , which was getting saved in the preceding data format. (While now, it get saved in reverse order), I am not using spring data mongo db.

MaBeuLux88_xxx · September 7, 2021, 2:15pm

Poor choice of document model in my opinion because the _id can be “hacked” like this:

db.coll.insertMany([
  {_id: {branchNumber: "London123", bald: 100}},
  {_id: {bald: 100, branchNumber: "London123"}}
])
{
  acknowledged: true,
  insertedIds: {
    '0': { branchNumber: 'London123', bald: 100 },
    '1': { bald: 100, branchNumber: 'London123' }
  }
}

So as you can see, I can totally insert these 2 documents in my collection and they have the “same” _id but it’s technically different (because of the order) so I can do it but I wouldn’t recommend it at all as it’s technically possible to insert a “duplicated” entry in my database from the business point of view.

A better implementation would be to use a standard ObjectId for the _id field and 2 normal fields for branchNumber and bald with a compound unique index on them to ensure uniqueness.

test [direct: primary] test> db.coll.deleteMany({})
{ acknowledged: true, deletedCount: 2 }
test [direct: primary] test> db.coll.createIndex({branchNumber: 1, bald: 1}, {unique: 1})
branchNumber_1_bald_1
test [direct: primary] test> db.coll.insertMany([{branchNumber: "London123", bald: 100},{bald: 100, branchNumber: "London123"}])
Uncaught:
MongoBulkWriteError: E11000 duplicate key error collection: test.coll index: branchNumber_1_bald_1 dup key: { branchNumber: "London123", bald: 100 }
Result: BulkWriteResult {
  result: {
    ok: 1,
    writeErrors: [
      WriteError {
        err: {
          index: 1,
          code: 11000,
          errmsg: 'E11000 duplicate key error collection: test.coll index: branchNumber_1_bald_1 dup key: { branchNumber: "London123", bald: 100 }',
          op: {
            bald: 100,
            branchNumber: 'London123',
            _id: ObjectId("6137714d2b36246158d0f720")
          }
        }
      }
    ],
    writeConcernErrors: [],
    insertedIds: [
      { index: 0, _id: ObjectId("6137714d2b36246158d0f71f") },
      { index: 1, _id: ObjectId("6137714d2b36246158d0f720") }
    ],
    nInserted: 1,
    nUpserted: 0,
    nMatched: 0,
    nModified: 0,
    nRemoved: 0,
    upserted: [],
    opTime: { ts: Timestamp({ t: 1631023437, i: 3 }), t: Long("1") }
  }
}
test [direct: primary] test> db.coll.find()
[
  {
    _id: ObjectId("6137714d2b36246158d0f71f"),
    branchNumber: 'London123',
    bald: 100
  }
]

As you can see in the above example, this implementation isn’t order-dependant and it’s bug-proof from the business point of view.

Now. To remove these documents from your current collection in it’s current state, I would send a query that isn’t order dependant in order to remove all the possible versions of this document: {_id: {a:1, b:1}} and {_id: {b:1, a:1}}.

The trick is to use the dot notation:

test [direct: primary] test> db.coll.insertMany([{_id:{branchNumber: "London123", bald: 100}},{_id: {bald: 100, branchNumber: "London123"}}])
{
  acknowledged: true,
  insertedIds: {
    '0': { branchNumber: 'London123', bald: 100 },
    '1': { bald: 100, branchNumber: 'London123' }
  }
}
test [direct: primary] test> db.coll.find()
[
  { _id: { branchNumber: 'London123', bald: 100 } },
  { _id: { bald: 100, branchNumber: 'London123' } }
]
test [direct: primary] test> db.coll.deleteMany({"_id.branchNumber": "London123", "_id.bald": 100})
{ acknowledged: true, deletedCount: 2 }

Here is some more doc using the Java Driver (change the language on the right).

In the end, your delete query should look like this:

(and(eq("_id.branchNumber": "London123"), eq("_id.bald", 100)))

I hope this helps.

Cheers,
Maxime.

wan · September 9, 2021, 6:25am

Hi @Madhav_kumar_Jha and welcome!

Generally you would add an access layer in front of the database for control and management. It really depends on the use case (i.e. abstraction), although common approaches are ranging from providing a common library to access the database to providing an HTTP API that others can utilise.

MongoDB stores documents internally as BSON (binary), which makes the order of attributes in a sub-document matters. Especially in your case, where _id field is also indexed by default (the sub-document).

You could try to construct a Document class. For example:

Document doc = new Document("_id", 
                   new Document("branchNumber", "London123")
                   .append("bald", 100));

Also a better choice is to redesign the schema by creating two fields on the same level as a compound index, as suggested by @MaBeuLux88_xxx.

Regards,
Wan.

system · September 14, 2021, 6:25am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.