Hi
I have following data set of 5 minutes bar
Date_Time, symbol, close_price
2022-06-14 10:00, ES, 5.31
2022-06-14 10:05, ES, 5.25
2022-06-14 10:10, ES, 5.29
2022-06-14 10:15, ES, 5.33
I would like to fetch the data so it would get the close_price from previous row for each current row so output will be
Date_Time, symbol, close_price,prev_close_price
2022-06-14 10:00, ES, 5.31, NULL
2022-06-14 10:05, ES, 5.25,5.31
2022-06-14 10:10, ES, 5.29,5.25
2022-06-14 10:15, ES, 5.33,5.29
in this case uniqueness is expected to be for Date_Time+ Symbol so query need to be accordingly
Please provide sample data and sample output in JSON format that we can cut-n-paste into our system so that we can experiment.
Also share what you have tried and indicate how it fails to provide the expected result. That will prevent us for investigating in a direction you already know does not work.
My first instinct would be to do a $lookup from:pricedata with a pipeline that $match the symbol, with $lt my own end_dt, with $sorts end_dt and $limit1.
Looking at the documentation, one of my best friend, I remembered:
A $match stage requires the use of an $expr operator to access the variables. The $expr operator allows the use of aggregation expressions inside of the $match syntax.
One more question I have on this topic. I have used above suggested approach on collection which has large dataset and I can see Aggregation Pipeline is slow and it’s because to get the previous row we are are using
{ “$lt” : [ “$end_dt” , “$$end_dt” ] } with “$sort” : { “end_dt” : -1.0 }
I was wondering is it possible in aggregation pipeline to access previous row based on index of row. (e.g if current row index is n then previous row should be accessed using n-1?)
The first thing is to have a search index that support the query if it is a regular use-case.
Documents don’t really have a position n. Even if they had, document at n-1 is not necessarily the document you want, it is if the symbol is the same, it is not if the symbol is $ne. You cannot know which document is the previous one unless you $match and $sort.
I did have index “symbol:1,end_dt:1” for pricedata collection
But because of lookup stage is using end_dt:-1 , I end up creating second index as symbol:1,end_dt:-1
after adding new index, lookup was faster comparatively previous scenario.
But my aggregation query is returning output with symbol asc order and end_dt in descending order.
when add additional stage after lookup with sort as symbol:1,end_dt:1 then it takes forever not sure why
Once documents are transformed indexes cannot really used because indexes points to the original documents from the collection, not the modified documents from the previous stage. So a late stage $sort is usually a memory sort rather than an index scan, and should be avoided.
I am surprised about
According to documentation the index symbol:1,end_dt:1 should support symbol:1,end_dt:-1 as written
For example, an index key pattern { a: 1, b: -1 } can support a sort on { a: 1, b: -1 } and { a: -1, b: 1 }
It would be interesting to see the explain plan. By the ESR rule the original index should be used and the descending one should not make a difference.
I would try to $sort:{“symbol”:1 , “end_dt” : -1 } in the inner pipeline. The result should be the same we $match symbol.
I do not think that this has anything to do with the index or the inner pipeline. The aggregation returns the documents in a non specified order when you do not sort. If you want symbol in order you have to $sort. To have $sort use the index it has to done before $lookup.