Tracking user actions (each with a duration) over time. Best database format?

Hello! Looking for advice from people who deeply understand performance and structures for a large amount of users and data.

Problem

I would like to track user actions, where each action can take a small amount of time, for example 20 seconds per action.

Suppose an app with 50 million users.

At any point in time, they might perform an action of 20 seconds or more. Actions are stored in chunks of 20 seconds, so an action longer than 20 seconds is really an action of multiple 20 second actions. A 1 minute “action” is three 20 second actions conceptually (is this even realistic? Maybe granularity needs adjusting, or maybe some granularity is dropped after certain amount of time).

Actions can be associated with users, or other “things” in the app, such as categories, tags, pages, etc. For sake of example, let’s stick to users, pages, categories, and tags.

Users can own pages, pages belong to categories, and pages have tags. All actions, users, and pages have a geolocation. Let’s assume users might spend 20,000 minutes performing actions in a year (50 mil users, 20,000 minutes of actions, per year)

Over time, I’d like to be able to “see” the following sorts of things:

  • total time acting on a page by a user, total times a user acted on a tag (f.e. pages with certain tag), in any time range
  • total time acting on a category by a user (f.e. when a user is on a page of a certain category), total times a user acted on a tag (f.e. pages with certain tag), in any time range
  • total time acting on a tag by a user, in any time range
  • total time acting on a user by a user (f.e. user acts on any page of another user), in any time range
  • total times a user acted on a page (not total time, but the total discrete portions of time the user acted on the page, if they visit for a while, leave, then visit again, that would be 2 times that they acted), in any time range. F.e. a “number of visits”.
  • total times a user acted on a category (f.e. pages of certain category), in any time range
  • total times a user acted on a tag (f.e. pages with certain tag), in any time range
  • total times a user acted on user (i.e. a user acts on any page of another user), in any time range
  • total number of times any user acted on a page, tag, category, user, in any time range
  • total time acting by any user on a page, tag, category, user, in any time range
  • total number of times any user acted in a geofence (let’s assume just a simple radius around a point for now), in any time range
  • total time acting by any user in a geofence, in any time range
  • average time acting by a user per page, category, tag, any time range, any geofence
  • average number of times any user acted per page, category, tag, any time range, any geofence
  • number of pages, categories, tags for any geofence
  • number of pages, categories, tags for any geofence
  • number of pages acted on by a user, in any time range, in any geofence
  • etc.

I think you can see how this is going!

The naive approach is obviously to have a single “actions” collection with items of the following format:

{
  timestamp, // timestamp of a 20-second interval
  latlon, // geolocation
  pageId
}

From there we can query for actions in any time range, any geofence (assume we know how to query a radius around a point), and map from pageId to any acting user, owning user, category, or tag.

Question

What might be the best non-naive way to track this data over time for the purpose of statistical analysis after having years or decades worth of data?

Existing solution idea

So far, an idea is using time range buckets to make it easier to query data later on. For example, for a particular page, there’d be a 20-second bucket (maybe too unrealistic), an hour bucket, a day bucket, a month bucket, and a year bucket.

Each of those buckets is of a shape like

{
  timestamp,
  pageId,
  totalTime,
  totalActions,
}

Using day buckets for example, the timestamp is the exact timestamp for the start of the day (f.e. in UTC), any time a user acts on a page, increment the current page’s day’s totalTime and totalActions once per 20 seconds (the conceptual minimum action time). totalActions may increment only if the user switched away and came back, tracked by the app somehow; every time we increment totalTime, totalActions may not increment if the user is still acting on the same page.

To track geofence stuff, we’d have to have a similar sets of buckets for things like cities, states, countries, etc.

Essentially with this approach, we need to know all the things we want to track up front, so we know all the buckets we need (day/month/year per user, day/month/year per page, day/month/year per geofence, etc).

Querying a large time range would be simplified: first get totals for a year, then the remaining months, and remaining days.


What are your ideas, and/or experiences, with this sort of thing?