I am currently working on a project that involves handling a high data ingestion rate from over 300 IoT sensors, where data is generated every 10 seconds. The database structure comprises three collections: device, variables, and values, forming a time series database.
I’m seeking advice on the most efficient approach to design and write data to the MongoDB Time Series Database given this scenario. Specifically, I would like recommendations on:
Schema Design: How should I structure the collections (device, variables, values) to optimize for high-frequency data ingestion while maintaining query efficiency?
Indexing: What indexing strategies are recommended for optimizing write operations on a time series database with a high volume of incoming data?
Write Concerns: Considering the high data ingestion rate, what are the recommended write concern settings to balance data durability and ingestion performance?
Any insights, best practices, or experiences you can share regarding optimizing MongoDB for time series data with a large number of IoT sensors would be greatly appreciated.
Thanks for reaching out to the MongoDB Community forums
When designing a TimeSeries schema, your choices should be guided by how you intend to retrieve and work with your data, as well as how frequently data ingestion occurs.
If you prefer straightforward and easy queries, creating a single TimeSeries collection with a document for each timestamp at seconds granularity, containing the IoT device name as metadata followed by its value, is a good option. This approach simplifies your queries, avoiding unnecessary complexity and processing.