Best way to organize database and data for my use case

Hi All,

We have a project where we are using SQL Server (Web Edition) as our data store. The source of data is IoT device, where right now we have around 2000 devices connected to the system, which sends data at every 10 seconds of interval and size of every data is around 200 Bytes. Now if we talk about SQL Server, then definitely we have fields like varchar, numbers etc, which calculates in a different way. Right now our sql server data file size is around 250GB.

We keep this data for 3 months always (and delete old data, which is really resource consuming. Moreover claiming back storage from SQL server is hard).

We are going to push sell on this project and estimating around 0.2 millions devices within next 1 year. So we are planning to move to Mongo. I have few questions for which I am not getting a definitive answer.

  1. If I calculate cost of mongo and SQL Server, then mongo is at higher side (taking into account a minimum 3 node implementation, basically 3 instance of VMs at Azure or AWS). Now how can I be within a budget? Is there any way where we can start small now and then scale up as IOT devices grow?

  2. How shall I organize the database? As you see its more like a timeseries data. We were planning to have different collections for each month, so that the expired data deletion becomes easy (we just drop older collections). My concern here is, as the number of devices will grow, the size of each months collection will grow as well. What will be the complexity of query (read) ? Our client apps continuously reads from datasource to give a live view of their devices. This is something I need to keep in mind.

  3. If we keep collection for each month, then what will be the complexity of reading data from 2 collections (where a user will search time series data within a date range, where the date falls in 2 different months)?

I am really confused reading elastic, mongo, casandra, hadoop and so many more. Unable to set my mind on something, which will be cost effecting to start with, scalable and can hold our future load.

Any help will be much appreciated.