I am working in a web app that will be used to survey around 2 million students.
The survey will ask different range questions (A range from 1 to 5 on how much the student agrees with the statement) and will also gather data on categories like gender, age, school, state, city, time to finish, etc. Also the survey will be repeated periodically to see the changes over time.
The web app has to have a viewer where data can be visualized with different charts but also filtered by categories to see differences in ages, schools, etc.
I think it would be wise to fetch a random sample of the data because of the scale of the project. But all outliers must be fetched because the point of the app is to find struggling students.
How would you organize the data? I just cant figure it out.