How to debug high WPUs for an Atlas serverless instance?

Abhinav_Kulkarni · November 14, 2022, 12:02pm

Hi,

I have a straightforward Atlas serverless installation populated with sample data that I’m looking to scale 10x, however, I am concerned about eventual costs. Looking at the invoice breakdown, most of the costs are due to the high number of WPUs - which is surprising since all of my updates/inserts are point queries.

I get the sense that high WPUs may be because of the multiple indexes I have to facilitate various views in my app.

I am running a podcast website. Each podcast may have a number of episodes and each episode has

a published_at field,
a few genres (such as Arts, Technology, etc.),
binary popularity flag (0 or 1), and
a few pre-determined hashtags - finite, but large in number (such as #Midterms2022, #ElonMusk, #SuperBowlLVII, etc.)

My app has the following views which take limit=10, offset=0 as default parameters (unless provided) and are sorted by published_at in descending order (newest first).

I have the following three views in my app:

# Lists a single episode
# /podcast/{podcast_id}/episode/{episode_id}
id_ = f"{podcast_id}/{episode_id}"
db.episodes.find({"_id": id_})


# Lists episodes in desc order of published_at
# /podcast/{podcast_id}/episodes
db.episodes\
  .find({"podcast_id": podcast_id})\
  .sort("published_at", -1)\
  .offset(offset)\
  .limit(limit)


# SRP for episodes in desc order of published_at
# /search/episodes?genre=genre_id&popularity=flag&hashtag=hashtag_id
expr1 = {"genres.id": genre_id}
expr2 = {"popularity": flag}
expr3 = {"hashtags.id": hashtag_id}
expr = {}
for expr_ in [expr1, expr2, expr3]:
  expr = {"$and": [expr_, expr]}
db.episodes\
  .find(expr)\
  .sort("published_at", -1)\
  .offset(offset)\
  .limit(limit)

So, I have added the following indexes:

import pymongo

db.episodes.create_index([("published_at", pymongo.DESCENDING)])

for name in [
    "podcast_id",
    "popularity",
    "genres.id",
    "hashtags.id",
]:
    db.episodes.create_index(
        [(name, pymongo.ASCENDING), ("published_at", pymongo.DESCENDING)],
    )

As you can imagine, my inserts are also fairly simple - I monitor the RSS feed of the podcast and whenever a new episode is published, I simply insert it into the DB. That’s about it.

Here are some stats about the episodes collection from the Atlas dashboard:

STORAGE SIZE: 4.42GB
LOGICAL DATA SIZE: 14.71GB
TOTAL DOCUMENTS: 4944529
INDEXES TOTAL SIZE: 631.56MB

Can somebody reason why I may have high WPUs?

Thanks!

Jason_Tran · November 22, 2022, 3:29am

Hi @Abhinav_Kulkarni,

Generally, the RPU/WPU that forms the basis of serverless charges concerns about the work needed to be performed by MongoDB to service the work. If MongoDB needs to do much work that requires many reads or writes (even though superficially it doesn’t look like it), then the RPU/WPU numbers will reflect this.

In saying so, it sounds like the main concern here is regarding the WPU’s. Could you provide the following details:

The insert command that is being used
The output of db.collection.stats()
The output of db.collection.getIndexes()
Any details regarding the amount of documents being inserted on average (roughly)
Any other write operations being sent to the serverless instance

The number of indexes you have defined in the collection will also affect the WPU numbers, since a write operation will need to write to the collection itself and all the associated indexes, as mentioned in the Write Operation Performance page

Regards,
Jason