I currently have a collection of ~50 documents (each around 1mb), where each document contains some metadata and an array of up to 10000 basic elements containing time, value.
When iterating over queried cursor, getting each document takes up to 0.5s. Is that normal performance for larger files or is there something wrong?
Please, share the characteristics of your installation.
Personally, I am worry about having a small number of documents that contains an order of magnitude more basic elements. I feel it is unbalanced. It looks like the bucket pattern has been over exploited.
Do you really need the 10k basic elements in the majority of your use-cases? If so and depending of what you are doing with those 10k data points, may be you should consider having the server doing the work using the aggregation framework.
Installation:
I have the default configuration of MongoDB 3.6 installed on a machine with 64GB RAM.
Regarding use case:
In majority of use cases I end up going through all of 10k data points iteratively, doing some statistical analysis, that’s a bit too complicated for the aggregation framework.
I tried to reduce the size of buckets (from 10k to 100), but that provided only a minor performance improvement.
I thought about just storing each data point as a separate document, but I figured bucket pattern would be of decent use here.
If there is any difference, I use Pymongo to access the database and perform my analysis.
The client connects to the machine remotely. To clarify more, client connects to read test data for an application from the DB
SSD
16 core cpu
The DB is mostly used for reading data, the data is modified rarely.
I understand that for this use-case it’s hard to optimize the data reading, but I thought that perhaps I did some DB design/usage mistakes that made an impact on the performance.
Also I wondered what speed could be expected from MongoDB+my setup when 1mb file is being fetched.
Hello, I’m colleague of Arnas. We are using MongoDB as a test data database, i.e. we read test inputs for our test application from the database, so this unfortunately rules out computed pattern. The raw data is always needed.
So is MongoDB feasible only for use cases where only small amount of data is needed at once? If this is the case we might need to return to the drawing board.
As you said that in majority of use cases you end up going through all 10k points in iteration.
Does 0.5 seconds the time that is taken to iterate or just to fetch ? And how do you measure this ?
I would suggest to perform a test from the same instance server (DB and application) to remove network latency. Your network performance would play an important factor here.
Also, do you have any database server metrics / performance monitoring on to check where the bottlenecks are ?