Aggregation pipeline to remove extreme values from Chart

magickaito · April 21, 2022, 11:22am

I am trying to draw a 7-day price chart for some NFTs, using MongoDB data. The chart is showing the minimum (floor) prices over a few days (green color line in the attached).

One problem is that, a few extreme transactions that sold at ridiculously low prices, made the sales floor green line going to zero.

If I filter the data using a query like this:

{"price": {$gte: 0.5}}

Green line will look more realistic:

But this only works because I know this particular chart the price should be around 10 to 20 so anything less than 0.5 is considered extreme. In reality, I do not know what is the prices being feed into the chart, the price could be between 10 to 20, or between 0.01 to 0.02. So I cannot hard code 0.5.

After researching, I learnt that I could use aggregation pipeline, specifically bucketAuto command, to split the prices into 10 buckets, remove last bucket, effectively remove the bottom 10% of the prices for specific NFT prices.

But unfortunately, I then learnt that, I cant inject this aggregation pipeline into my chart when embedding it as iframe. This is roughly how I render the chart iframe for a specific NFT collection:

“https://charts.mongoxx.com/xxxxx/embed/charts?id=” + chartId + ‘&theme=’ + this.theme() + ‘&maxDataAge=3600&autoRefresh=true&filter={“metadata.collection_slug”:{$eq:"’ + xxxxxx + '"}}

I cannot create ONE single aggregation pipeline to remove bottom 10% of the outlier values because the prices/transactions collection contains prices for many NFTs.

I can theoretically create thousands of aggregation pipelines at the data sources manually, but that is extremely painful and not logical.

Is there other tricks to resolve this issue? Fundamentally I just want to remove the extreme values in the chart, without hard coding anything.

tomhollander · April 21, 2022, 11:45am

Thanks for sharing your scenario. We’ll have a think about what we can do to make this easier. But here is an idea which may work:

If you embed the chart using the Embedding SDK instead of an iframes, you can call the getData method to programmatically access the chart data. You can then scan the values to see if you have anything you would consider an outlier, and if so, call setFilter to filter those values out of the chart.

Let me know if this will work for you.

Tom

magickaito · April 21, 2022, 11:53am

Thanks for the idea, will definitely give it a try.

magickaito · April 22, 2022, 5:20am

Just an update that, it doesn’t really work for me to use the getData to remove extreme values.

The 7-day prices returned from getData is already “binned” to 7 days, so the 7 numbers are already affected by extreme values. (Imagine 7 zeros returned, there is nothing I can do to these zeros).

tomhollander · April 26, 2022, 3:11am

Thanks, makes sense. Can you explain why you need to inject the bucketAuto stage at render time? Is it an option to just bake this query into the saved chart?

Tom

magickaito · April 26, 2022, 4:19am

Hi Tom,

Thanks for the replies.
For my case, my mongo collection contains price movements for thousands of NFTs.

Thanks to the ability to inject “filter” programatically at render time (either iframe or sdk), I only need to create one single 7-day chart, and the single chart can be re-used thousands of times to render 7-day prices for any specific NFT.

But if I were to bake the bucketAuto stage into query, the query may look like this:

[
  {
    $match:{"metadata.collection_slug":"SPECIFIC HARDCODED NFT NAME WHICH IS NOT GOOD"}
  },
  {
    $bucketAuto: 
    {
        groupBy: "$price",
        buckets: 10,
        output: {
          docs: {
            $push: "$$ROOT"
          }
        }
    }
  },  
  {
    $skip: 1
  },
  {
    $unwind: "$docs"
  },
  {
    $replaceWith: "$docs"
  }
]

The above query works if I hardcode a specific NFT name into the query, but that means this chart can no longer be re-used for many NFTs.

I also tried another possible scenario, like removing the $match stage from the query, only keeping the bucketAuto:

[
  {
    $bucketAuto: 
    {
        groupBy: "$price",
        buckets: 10,
        output: {
          docs: {
            $push: "$$ROOT"
          }
        }
    }
  },  
  {
    $skip: 1
  },
  {
    $unwind: "$docs"
  },
  {
    $replaceWith: "$docs"
  }
]

And hoping that the generic query above will work magically with the dynamic filter supplied at the render time, but unfortunately if I try a query like the above, I can’t even save the query into the chart to give it a try because the UI will complain “ExceededMemoryLimit”:

So it feels like if I could supply the match and bucketAuto query dynamically during render time, the issue can be resolved.