I have executed the following queries to find exact data or range queries using pymongo.
query: db.coll.find({attr:value})
I have measured the execution time of the query before and after index creation. This query was executed on a Collection of .5G, 1G, 2G, 1and 10G sizes. Surprisingly both show almost the same execution time. Why?
IXSCAN should be much faster than COLLSCAN.
Are you sure that your index is used? Can you share both explain plan?
If you have more than 20G of RAM on your test machine it is possible that the working set fits in RAM and that you do not see any difference even for your 10G collection. You may force the storage engine to use less RAM so that your working set does not fit in RAM to see the full negative effect of COLLSCAN.
Another that can explain why you do not observe a difference is that if your query is not covered and a big majority of documents are returned, then the FETCH will take as much time for IXSCAN as the COLLSCAN takes. In particular if you have low RAM and slow disk.
db.c.find() returns a cursor, so if you do not consume the data, it is possible that you do not see any effect because find() is simply setting up the cursor and nothing is really done until the first batch of document is retrieve. Without seeing the whole code it is hard to tell if anything is wrong there.
Document size is more determinant than the total size of the collection. Doing a COLLSCAN on 1 document of 10G is probably as fast as doing an IXSCAN.
I am sure that some other factors can explain what you observed.
I understand scenarios where COLLSCAN s preferred, which include large range match, return majority of requirements etc.
But, I have applied find with exact match on a collection , which do not have any index apart from _id. In absence of index, COLLSCAN obvious. But, my question is how come execution time of exact match using COLLSCAN and IXSCAN would be almost same?
Test collection : emp (_id, name , salary)
Test Query : db.emp.find(salary = )
But my answer is the same:
and I have listed some situation that might explain what you observed.
To help us help you understand what you observe we need some information:
- The explain plans of your COLLSCAN and IXSCAN
- Your db.emp.stats()
- Your db.emp.getIndexes()
- Your test code that shows how you determine execution time
- Sample emp documents that we can import into our environment, the schema is not enough for us
- Details of your test system, RAM, disk size and type, CPUs
In both COLLSCAN and IXSCAN tests, take the numbers on a hot server, rather than a cold server. The interesting numbers are always the one from a hot running server rather than a cold one where working set is loaded in cache.
I also believe that IXSCAN should be much faster than COLLSCAN. This is my basic question. db.emp.getIndexes(), and db.emp.stats().indexSizes shows only “_id” index. Explain plan also shows COLLSCAN. But, execution time of it is found almost same as after index.
I would verify again.
It means you do not have an index other than _id. How do you expect to have the query db.emp.find( { salary : SomeValue } ) perform an index scan if you do not have an index? So I am not too sure about:
If getIndexes() and stats() only shows _id index and explain plan does a COLLSCAN then may be you have created the index on the wrong collection.