Hi,
I have a straightforward Atlas serverless installation populated with sample data that I’m looking to scale 10x, however, I am concerned about eventual costs. Looking at the invoice breakdown, most of the costs are due to the high number of WPUs - which is surprising since all of my updates/inserts are point queries.
I get the sense that high WPUs may be because of the multiple indexes I have to facilitate various views in my app.
I am running a podcast website. Each podcast may have a number of episodes and each episode has
- a
published_at
field, - a few genres (such as
Arts
,Technology
, etc.), - binary popularity flag (
0
or1
), and - a few pre-determined hashtags - finite, but large in number (such as
#Midterms2022
,#ElonMusk
,#SuperBowlLVII
, etc.)
My app has the following views which take limit=10
, offset=0
as default parameters (unless provided) and are sorted by published_at
in descending order (newest first).
I have the following three views in my app:
# Lists a single episode
# /podcast/{podcast_id}/episode/{episode_id}
id_ = f"{podcast_id}/{episode_id}"
db.episodes.find({"_id": id_})
# Lists episodes in desc order of published_at
# /podcast/{podcast_id}/episodes
db.episodes\
.find({"podcast_id": podcast_id})\
.sort("published_at", -1)\
.offset(offset)\
.limit(limit)
# SRP for episodes in desc order of published_at
# /search/episodes?genre=genre_id&popularity=flag&hashtag=hashtag_id
expr1 = {"genres.id": genre_id}
expr2 = {"popularity": flag}
expr3 = {"hashtags.id": hashtag_id}
expr = {}
for expr_ in [expr1, expr2, expr3]:
expr = {"$and": [expr_, expr]}
db.episodes\
.find(expr)\
.sort("published_at", -1)\
.offset(offset)\
.limit(limit)
So, I have added the following indexes:
import pymongo
db.episodes.create_index([("published_at", pymongo.DESCENDING)])
for name in [
"podcast_id",
"popularity",
"genres.id",
"hashtags.id",
]:
db.episodes.create_index(
[(name, pymongo.ASCENDING), ("published_at", pymongo.DESCENDING)],
)
As you can imagine, my inserts are also fairly simple - I monitor the RSS feed of the podcast and whenever a new episode is published, I simply insert it into the DB. That’s about it.
Here are some stats about the episodes
collection from the Atlas dashboard:
STORAGE SIZE: 4.42GB
LOGICAL DATA SIZE: 14.71GB
TOTAL DOCUMENTS: 4944529
INDEXES TOTAL SIZE: 631.56MB
Can somebody reason why I may have high WPUs?
Thanks!