Python detect duplicate Docs

So I am working on a project that query’s the local system for specific parameters and then needs to upload those parameters. The key is a specific system could be scanned 100’s of time and I need to determine if a doc = the local json and if so return object id. I have tried several different ways… unique indexes updateone…

What is the recommended method for doing this?

The solution I came up with is doing a query then detecting if the query brings back a document. I feel this is clunky, but works. If I determine there is a match, I will return the objectID, for post-processing

def db_query_for_existing_HW(env_hw):  
    env = env_hw()
    myclient = MongoClient(env.connection_string)
    mydb = myclient[env.db_name]
    col_hw = mydb[env.col_name_hw]
		#open current HW Manifest
    with open(env.manifest_path) as f:
        data = json.load(f)
    length = len(data['system board'])
    dict_systemboard = create_HW_dicts(data, "system board", length)
    length = len(data['cpu'])
    dict_cpus = create_HW_dicts(data, "cpu", length)
	q = {
        "system board.description": dict_systemboard['description'],
        "cpu.code_name": dict_cpus['code_name']
    myquery = col_hw.find(q)
    for x in myquery:
    return env

Hello @Joe_N_A, Welcome to the MongoDB Community forum!

As you are just checking if there is a match in the database, and then return a True , the following is simpler:

myquery = col_hw.find_one(q)
if myquery:
    env.match = True

Note the find_one returns the first matching document or a None in case there is no match. This avoids reading all the matching documents in a for-loop.

1 Like

You are correct, just looking for a match, so I don’t pollute the db with the same docs.