Passing a mongodb query from json file to execute in python

harris · March 30, 2021, 9:11am

I have a json file with around 50 queries.
The json file look like this:

{

   "1": {
     "mongodb":"mydb1.mongodbtime.find({\n    \"timestamp1\": {\"$gte\": datetime.strptime(\"2010-01-01 00:05:00\", \"%Y-%m-%d %H:%M:%S\"),\n                   \"$lte\": datetime.strptime(\"2015-01-02 00:05:00\", \"%Y-%m-%d %H:%M:%S\")}},\n    {\"id13\":1}),",
     "mongodb1index":"mydb1.mongodbindextimestamp1.find({\n    \"timestamp1\": {\"$gte\": datetime.strptime(\"2010-01-01 00:05:00\", \"%Y-%m-%d %H:%M:%S\"),\n                   \"$lte\": datetime.strptime(\"2015-01-02 00:05:00\", \"%Y-%m-%d %H:%M:%S\")}},\n    {\"id13\":1}),"
   },
   "2": {
     "mongodb":"mydb1.mongodbtime.find({\n    \"timestamp1\": {\"$gte\": datetime.strptime(\"2010-01-01 00:05:00\", \"%Y-%m-%d %H:%M:%S\"),\n                   \"$lte\": datetime.strptime(\"2015-01-02 00:05:00\", \"%Y-%m-%d %H:%M:%S\")}},\n    {\"id13\":1}),",
     "mongodb1index":"mydb1.mongodbindextimestamp1.find({\n    \"timestamp1\": {\"$gte\": datetime.strptime(\"2010-01-01 00:05:00\", \"%Y-%m-%d %H:%M:%S\"),\n                   \"$lte\": datetime.strptime(\"2015-01-02 00:05:00\", \"%Y-%m-%d %H:%M:%S\")}},\n    {\"id13\":1}),

   }

}

I have two collections one named mongodbtime and one called mongodbindextimestamp1 in the database mongodbtime.
The code i have used in python for passing the query and execute it look like this:

mydb1 = myclient["mongodbtime"]


with open("queriesdb.json",'r') as fp:
    queries = json.load(fp)
    db = {"mongodb": "mongodbtime", "mongodb1index": "mongodbtime"}
    for num_query in queries.keys():
        query = queries["1"]
        print(query)
        for db_name in db:
            print(db_name)
            run(query[db_name])

def run(query):

        for j in range(0, 1):
            
            start_time = time.time()
            cursor = query
            for x in cursor:
                pprint(x)

            # capture end time
            end_time = time.time()
            # calculate elapsed time
            elapsed_time = end_time - start_time
            times.append(elapsed_time)
            #elapsed_time_milliSeconds = elapsed_time * 1000
            #print("code elapsed time in milliseconds is ", elapsed_time_milliSeconds)
        finalmeasurments(times)

I passed it like a string and obviously when i print(cursor) it just print the query Should i use another form of file?
Any idea on how i should execute my query?

Mark_Smith · March 30, 2021, 9:34am

Hi @harris,

This is really a Python question, not a MongoDB question - but I can help!

What you’re trying to achieve here is to load snippets of code as strings from a JSON file, and then execute them as Python code. You can do this with Python’s eval() function. Please read the warning below.

Replacing your cursor = query with cursor = eval(query) should work, I think.

Warning: Make sure the data you’re loading in via JSON is safe - i.e., created and managed by you. Using eval on strings that are provided by other people or systems is a huge security flaw - it allows people to send you arbitrary code that will be run on your computer.

harris · March 30, 2021, 9:41am

Thank you @Mark_Smith.I know its a python question and i am sorry for that,i can delete it after!Thank you so much again ,i couldn’t find any solution on my own .

Mark_Smith · March 30, 2021, 9:41am

Just a note that I think this should read:

for num_query in queries.keys():
    query = queries[num_query]

… but could be further simplified to:

for num_query, query in queries.items():
    print(query)

items() returns an iterator over (key, value) pairs in a dict.

Mark_Smith · March 30, 2021, 9:43am

It’s not a problem to ask Python questions here! Please don’t delete the question - I just wanted to be clear the problem I was solving

harris · March 31, 2021, 5:32pm

How are you @Mark_Smith?i have one more question to ask if i may.I am using these test for benchmarking purposes.Specifically mongodb vs postgresql so execution time is very important for me.Do you think that may eval adds more execution time?I dont use eval in my postgresql tests so i am wondering if eval is the right choice.

Mark_Smith · April 1, 2021, 8:28am

If you’re concerned about the parsing overhead eval, you have two options that come to mind:

1: You could store your data structure containing your benchmarks in Python code - maybe in another module that you could import. Functions can be stored in dicts in Python, so you could define something like:

def benchmark1(coll):
     coll.find({ ... })

BENCHMARKS = {
    "1": {
        "mongodb": benchmark1
    }
}

There would be a minor overhead with the extra function call, but I would expect it to be trivial compared to everything else.

2: You could use Python’s compile function outside of your loop to precompile the Python expression you already have into a code object. You’d still run the resulting code object with eval, but because it’s already been parsed, it should be more-or-less equivalent to just calling the code directly.

My personal opinion: If I was writing this from scratch, I’d probably go with option 1. Because you’re nearly there already, I’d probably go with option 2.

I hope this helps! Benchmarks can be tricky.

harris · April 1, 2021, 9:12am

Thanks for your reply @Mark_Smith.In the option 2 do you mean something like that?

mydb1 = myclient["mongodbtime"]


with open("queriesdb.json",'r') as fp:
    queries = json.load(fp)
    db = {"mongodb": "mongodbtime", "mongodb1index": "mongodbtime"}
    for num_query in queries.keys():
        query = queries[num_query]
        print(query)
        for db_name in db:
            print(db_name)
            run(query[db_name])

def run(query):

        for j in range(0, 10):
            code_obj = compile(query, 'queriesdb.json', 'eval')

            start_time = time.time()
            cursor = eval(code_obj)
            # capture end time
            end_time = time.time()
            # calculate elapsed time
            elapsed_time = end_time - start_time
            times.append(elapsed_time)
            #elapsed_time_milliSeconds = elapsed_time * 1000
            #print("code elapsed time in milliseconds is ", elapsed_time_milliSeconds)
        finalmeasurments(times)

It doesnt seems to work.
The output is :

mydb1.mongodbtime.aggregate(
    ^
IndentationError: unexpected indent

Mark_Smith · April 1, 2021, 3:07pm

That is almost what I meant. You’ve got your call to compile inside your for j in ... loop, which means you’re compiling the code 10 times - which is a waste of time. If you move the compile call 2 lines up, then you’ll compile once, and then eval the code object 10 times, which is more efficient.

Check the documentation for compile - I think you need to pass in the argument mode='eval' for your use-case.

system · April 6, 2021, 3:07pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.