Insert bulk of unique data but if duplicate occur then insert one which has max(value of attribute)

insert bulk of unique data but if duplicate occur then insert one which has max(value of attribute)
how to do it??

What’s the format of the data you are trying to import?
Can you share a sample maybe?

Firstly i would like to thank you for your response.

And data looks somethings like this

{ “Company name”: “XYZ”,

    "Registered number": "123",

    "Number of employees": 500,

    "Website": "www.xyz.co.uk",

    "Url": "XYZ.co.uk",

    "BvD ID number": "62791",

    "BvD sector": [

        "Business Services"

    ],

    "R/O City": "Cam",

    "R/O Country": "Surrey"

}

So what i need if same company name exist in collection then insert which one who have greater no of employess.

and currently i am achieving this by sorting the data file based on no. of employess in a script… and then insert all data by insert_many operation using unique constraint

but now i want to know is there any way to do all these stuff using mongo.

If you keep sorting the input ascending, you could achieve the result without the unique index by using this I think: https://docs.mongodb.com/manual/reference/method/Bulk.find.upsert/#update-operator-expressions-1.

You would just try to find your docs in the find(...) based on your filter (I think it’s compagy_name for you (I would avoid whitespaces in field names)) and then you could just $set the fields you want to always update or use the $setOnInsert if you only want to insert values if the upsert operation results in a insert rather than an update.

I would use an index on company_name to speed up the search and therefore the insert.

Another idea to avoid the sorting of the input would be to $push the nb_employees in an array rather than trying to only insert the biggest one at the end.

You could then reduce the array with an aggregation pipeline which would be a $unwind on the array + $group on the _id and $max of the value + $merge in the same collection to overwrite the nb_employee array bu the max of the array.

I hope I make sense :confused: . It’s not easy to explain.

Cheers,
Maxime.

Actually, something like this would be enough. My aggregation was overkill!

myset [direct: primary] test> db.coll.insert({name:"max", emp: [1,3,2]})
{
  acknowledged: true,
  insertedIds: { '0': ObjectId("622f8da96daaa9ad02ba8fe3") }
}
myset [direct: primary] test> db.coll.update({},[{$set: {emp: {$max: '$emp'}}}])
{
  acknowledged: true,
  insertedId: null,
  matchedCount: 1,
  modifiedCount: 1,
  upsertedCount: 0
}
myset [direct: primary] test> db.coll.find()
[ { _id: ObjectId("622f8da96daaa9ad02ba8fe3"), name: 'max', emp: 3 } ]

Cheers,
Maxime.

Yeah, you explained pretty well…
And it’s really gonna help me a lot.

thank you for your time,

Yash

1 Like