Storing millions of potential documents

Jeffery_Vincent · February 10, 2020, 4:57pm

Hi everyone,

I need some advice on how to store stock trades in MongoDB Atlas. I will potentially have millions of documents for each user depending on how much they actively trade.

I am currently using the bucket pattern to store financial transactions with aggregate and it works well for syncing, but I can’t really filter and sort results properly.

So I was wondering would I use the same pattern for when it comes time to store the trades?

Justin · February 10, 2020, 5:54pm

Hi Jeffery,

The bucket pattern is a great solution for storing large quantities of data but you’re right that it does have a few drawbacks, specifically when it comes to things like sorting. Ultimately, your use case should drive the schema decisions you make.

What are you doing with the stock trades you’re storing? Displaying them in pages (bucket pattern)? Performing aggregate calculations (computed pattern)? Different schema design patterns are useful for different use cases.

I recommend checking out a blog entry by Daniel Coupal called “Building with Patterns: A Summary”: Building with Patterns: A Summary | MongoDB Blog .

If you’re paging through trades, there’s also my blog post on this exact topic: Paging with the Bucket Pattern - Part 1 | MongoDB Blog

I hope this helps get you started!

Ihsan_Mujdeci · February 10, 2020, 10:34pm

Hey Justin,

Really nice articles. I read through the patterns and pagination with bucket pattern.
In your examples for the bucket pagination you give the id of the inserting document “customerId + timestampOfTrade” I understand this makes a unique entry and can still be queried to find by customer id by using regex but are there any other benefits? Could the timestamp be useful in any kind of query?

Justin · February 11, 2020, 12:25am

The second concatenated part of _id can really be any positively increasing monotonic value.

Logically, a positively increasing monotonic value only comes in two flavors: a positively increasing consecutive, or positively increasing non-consecutive value. Consecutive values are almost always a bad idea for use in databases for a variety of reasons (mostly because they’re hard to generate reliably in a distributed system). That leaves us with non-consecutive numbers.

A timestamp is a readily available and easy to understand. Calling it a positively increasing non-consecutive monotonic value is also accurate but much less understandable. Timestamps also have the benefit of using $gt or $lt range queries on _id per customerId through a time range using a timestamp.

Ihsan_Mujdeci · February 11, 2020, 1:10am

" Timestamps also have the benefit of using $gt or $lt range queries on _id per customerId through a time range using a timestamp."
That’s very true, thanks for the insight mate.

Prasad_Saya · February 11, 2020, 1:37am

Can you be specific about the filtering and sorting aspects. What are the issues you are facing? Any use case you want to discuss (and may be find some solution).

Yong_Wei_Lun · May 31, 2020, 9:15am

Is bucket pattern applies to chat history? That could be updated. It looks like bucket pattern is hard to perform update operation.