Petabytes of binary images, videos

An app has petabytes of data mostly of images & videos. Data is user generated like Facebook.

The data is always less than 16 mb but is very high in number. No need for grid.

There is a lot of updation but a document has low concurrent users i.e less than 100.

To summarise, there is a lot of user generated content which is accessed by a small number of people. Like a private Instagram account. But is saved locally once read by its viewers.

Can mongo Realm handle this via sharding? What would be the architecture for this case?

Will there be performance issues? Cost per user uploading 1 gb a month?

Realm can easily handle this because you won’t be storing video or document files (aka blob type data) in Realm itself. You’ll store those in a service that’s specifically designed to handle that, and then use Realm to handle all of the other taks; UI, queries, storing references to the videos etc.

MongoDB offers Realm as one product and for storing larger blob type data, they offer GridFS. You can also leverage services through AWS or Google’s Firebase Storage.

Generally speaking, sticking with a single provider is a bit more seamless so I would check out GridFS.

So you’re suggesting that for such a use case it’s better to store these binaries elsewhere and simply store a url reference in mongo realm. Even if the individual image/video size is less than 16 mb ?
As per my understanding BSON can store binaries.
Is this bad for performance or is it cost intensive?

You got it!

While the maximum store is 16Gb, that doesn’t mean it should be used every time. There is overhead involved and some other behind-the-scenes stuff which makes storing blob type data less efficient. See Size Limitations

To avoid size limitations and a performance impact, it is best not to store large blobs, such as image and video files, directly in a realm. Instead, save the file to a file store and keep only the location of the file and any relevant metadata in the realm.

More importantly though - Realm is a full sync database - when a user has 100 videos stored in Realm it stores that data both on the device as well as in online storage; Realm Sync (I assume this is a cloud sync situation). Because of that, it could quickly overwhelm the device.

It would be much more efficient to only move data to the device as needed; which can be done when the data is stored outside of Realm.

1 Like

I’m interested in using GridFS. However, I can’t find any good documentation on it.
(tutorials, pricing, etc)

Im using swift for macOS.

There are Swift Drivers for it - you can find in the Swift Drivers section of the documentation

The usage guide has enough examples to get you going.

For pricing, reach out to sales.

Awesome! thanks a ton.