Designing a Folder-Like Schema in MongoDB for High-Volume Medical Time-Series Data, Emphasizing on Fast Read And Few Writes

Hello :sun_with_face:

I have around 100,000 numpy arrays each of approximate size [3600 * 256, 20] which represents recordings with size around duration~3600 seconds, and num_channel~20 sensors, and sampling rate of Fs~256 HZ. These are some medical recordings of around 25_000 patients and each has around 4 sessions of recording.

I need to store these numpy arrays with some tags like (patient_id, recording_id) so i can easily filter each numpy array later to access the patient data.

I need to store all above data to my database at creation time and emulate a structure as follow:

database
   └── patient_id (Multiple signals per a patient, similar to a folder + some metadata)
      └── recording_id (Holds signal itself similar to a file + some metadata)
    

After creating dataset, I have the following tasks:

Task 1: Write Small Chunks of Data Every 10 Seconds

About every 10 second, I need to append some data to a few specific arrays. Data comes from up to 10 devices where each sends data with shape [Fs*10, num_channel] and a (patient_id, recording_id) tuple.

Task 2: Read Small Chunks of Data Very Often

Given (patient_id, recording_id) tuple and a (start_offset and read_duration) tuple I should read the data for that specific patient and recording with a ofset from start of the recording

MY QUESTION:

How to structure the data so mongodb knows that I have this kind of a folder-like structure and can query data pretty FAST For READ. Write tasks seems to be easy to handle, but the sampling rate is 256 Hz which seems higher than most what i see in the docs, so how to batch data?.

What I Tried

So far, I’ve learnt about three possible solutions, TimeSeries API, GridFS API, and default document API with some chunking. MongoDB AI says that TimeSeries API and GridFS API are not good for this, and the default API doesn’t seem great for this. Is it even a good idea to use MongoDB for this task? I asked a similar question on InfluxDB forum as well btw.

If possible, please talk in Python :grin:.

Thanks in advance.