I’ve some thoughts that may help optimize this aggregation pipeline query:
First, the $in operator on org_id is currently not able to take advantage of the index on org_id since it is checking for equality against a large array of values. To use the index more efficiently, you could replace the $in with an $or condition checking for equality against each ID individually.
However, this may not be feasible if the number of IDs is very large.
Further, you can consider adding an index on {story_dt: 1, org_id: 1, 'topics.topic_cd': 1} to support the full query criteria, not just org_id. This is a compound index spanning all fields used in the $match.
There is no hard-coded limit on the number of values in $in. But in general, very large $in lists with hundreds or thousands of items will negatively impact query performance.
Let us know if this helps or if you have any further questions!