I have a large timeseries data collection. Think 10Hz time series data sampled data running 24/7. It’s mostly jagged tabular data.
The primary index on this is time.
If you run a query to return a weeks worth of data, this is roughly 6K rows per column in each document.
The goal is to do some fairly compute intense calculations on this data.
Running the compute cycles in the database is what I’d expect for a query followed by a main CPU based execution engine. However, returning that kind of data to be processed by different computing resources is slower than I’d expect for an I/O operation. I suspect this is because the data is returned in a text format and not a binary format. I’m not sure how to change how data is returned. If this is possible, I’m not looking for the right things.
What is the best way to deal with large volumes of data being returned from a MongoDB query?