Fetch the value of column from previous row

Dhruvesh_Patel · June 14, 2022, 6:06pm

Hi
I have following data set of 5 minutes bar
Date_Time, symbol, close_price
2022-06-14 10:00, ES, 5.31
2022-06-14 10:05, ES, 5.25
2022-06-14 10:10, ES, 5.29
2022-06-14 10:15, ES, 5.33

I would like to fetch the data so it would get the close_price from previous row for each current row so output will be
Date_Time, symbol, close_price,prev_close_price
2022-06-14 10:00, ES, 5.31, NULL
2022-06-14 10:05, ES, 5.25,5.31
2022-06-14 10:10, ES, 5.29,5.25
2022-06-14 10:15, ES, 5.33,5.29

in this case uniqueness is expected to be for Date_Time+ Symbol so query need to be accordingly

Thanks in advance for looking into this

Dhru

steevej · June 14, 2022, 6:11pm

Please provide sample data and sample output in JSON format that we can cut-n-paste into our system so that we can experiment.

Also share what you have tried and indicate how it fails to provide the expected result. That will prevent us for investigating in a direction you already know does not work.

Dhruvesh_Patel · June 14, 2022, 6:40pm

Here is json formatted data

{ 
    "end_dt" : ISODate("2022-05-25T09:50:00.000+0000"), 
    "close" : 1849.9, 
    "symbol" : "1GCM2"
}
{ 
    "end_dt" : ISODate("2022-05-25T09:45:00.000+0000"), 
    "close" : 1849.6, 
    "symbol" : "1GCM2"
}
{ 
    "end_dt" : ISODate("2022-05-25T09:40:00.000+0000"), 
    "close" : 1849.7, 
    "symbol" : "1GCM2"
}
{ 
    "end_dt" : ISODate("2022-05-25T09:35:00.000+0000"), 
    "close" : 1852.3, 
    "symbol" : "1GCM2"
}
{ 
    "end_dt" : ISODate("2022-05-25T09:30:00.000+0000"), 
    "close" : 1851.0, 
    "symbol" : "1GCM2"
}
{ 
    "end_dt" : ISODate("2022-05-25T09:25:00.000+0000"), 
    "close" : 1850.8, 
    "symbol" : "1GCM2"
}
{ 
    "end_dt" : ISODate("2022-05-25T09:20:00.000+0000"), 
    "close" : 1851.0, 
    "symbol" : "1GCM2"
}

Thanks
Dhru

steevej · June 14, 2022, 7:45pm

The quotes are not usable.

See Formatting code and log snippets in posts

Dhruvesh_Patel · June 14, 2022, 8:30pm

let me look into and get back to you

Thanks
Dhru

Dhruvesh_Patel · June 14, 2022, 8:51pm

Let me know if still any issue with format

db.pricedata.insertMany([

{end_dt : ISODate("2022-05-25T09:50:00.000+0000"),close : 1849.9,symbol : "1GCM2"},
{end_dt : ISODate("2022-05-25T09:45:00.000+0000"),close : 1849.6,symbol : "1GCM2"},
{end_dt : ISODate("2022-05-25T09:40:00.000+0000"),close : 1849.7,symbol : "1GCM2"},
{end_dt : ISODate("2022-05-25T09:35:00.000+0000"),close : 1852.3,symbol : "1GCM2"},
{end_dt : ISODate("2022-05-25T09:30:00.000+0000"),close : 1851.0,symbol : "1GCM2"},
{end_dt : ISODate("2022-05-25T09:25:00.000+0000"),close : 1850.8,symbol : "1GCM2"},
{end_dt : ISODate("2022-05-25T09:20:00.000+0000"),close : 1851.0,symbol : "1GCM2"}
]);

steevej · June 14, 2022, 10:19pm

My first instinct would be to do a $lookup from:pricedata with a pipeline that $match the symbol, with $lt my own end_dt, with $sorts end_dt and $limit1.

Something like: (not able to test at this point)

{ "$lookup" : { 
  "from" : "pricedata" ,
  "let" : {
     "symbol" : "$symbol" ,
     "end_dt" : "$end_dt"
  }
  "pipeline" : [
    { "$match" : {
      "symbol" : "$$symbol" ,
      "end_dt" : { "$lt" : $$end_dt" }
    } } ,
    { "$sort" : { "end_dt" : -1 } } ,
    { "$limit" 1 }
  ] ,
  "as" : "open"
} }

Dhruvesh_Patel · June 15, 2022, 1:02pm

I am getting prev_close as empty array , our server version is 4.4, will following syntax is compatible with it?

db.getCollection("pricedata").aggregate(
    [
        { 
            "$lookup" : { 
                "from" : "pricedata", 
                "let" : { 
                    "symbol" : "$symbol", 
                    "end_dt" : "$end_dt"
                }, 
                "pipeline" : [
                    { 
                        "$match" : { 
                            "symbol" : "$$symbol", 
                            "end_dt" : { 
                                "$lt" : "$$end_dt"
                            }
                        }
                    }, 
                    { 
                        "$sort" : { 
                            "end_dt" : -1.0
                        }
                    }, 
                    { 
                        "$limit" :1.0
                    }
                ], 
                "as" : "prev_close"
            }
        }
    ], 
    { 
        "allowDiskUse" : false
    }
);

steevej · June 15, 2022, 1:22pm

The documentation indicates

MongoDB 3.6 adds support for:

Executing a pipeline on a joined collection.

so it should work.

Looking at the documentation, one of my best friend, I remembered:

A $match stage requires the use of an $expr operator to access the variables. The $expr operator allows the use of aggregation expressions inside of the $match syntax.

With $expr the $match becomes:

 "$match" : { "$expr" : { "$and" : [
                            { "$eq" : [ "$symbol" , "$$symbol" ] } , 
                            { "$lt" : [ "$end_dt" ,
                                "$$end_dt"
                            ] }
                        ] } }

Dhruvesh_Patel · June 15, 2022, 5:01pm

Look like its working now. Thanks for your help

Dhruvesh_Patel · June 16, 2022, 3:50pm

One more question I have on this topic. I have used above suggested approach on collection which has large dataset and I can see Aggregation Pipeline is slow and it’s because to get the previous row we are are using
{ “$lt” : [ “$end_dt” , “$$end_dt” ] } with “$sort” : { “end_dt” : -1.0 }

I was wondering is it possible in aggregation pipeline to access previous row based on index of row. (e.g if current row index is n then previous row should be accessed using n-1?)

Thanks

steevej · June 16, 2022, 7:36pm

The first thing is to have a search index that support the query if it is a regular use-case.

Documents don’t really have a position n. Even if they had, document at n-1 is not necessarily the document you want, it is if the symbol is the same, it is not if the symbol is $ne. You cannot know which document is the previous one unless you $match and $sort.

Dhruvesh_Patel · June 16, 2022, 7:58pm

I did have index “symbol:1,end_dt:1” for pricedata collection

But because of lookup stage is using end_dt:-1 , I end up creating second index as symbol:1,end_dt:-1
after adding new index, lookup was faster comparatively previous scenario.
But my aggregation query is returning output with symbol asc order and end_dt in descending order.
when add additional stage after lookup with sort as symbol:1,end_dt:1 then it takes forever not sure why

steevej · June 17, 2022, 1:55am

Once documents are transformed indexes cannot really used because indexes points to the original documents from the collection, not the modified documents from the previous stage. So a late stage $sort is usually a memory sort rather than an index scan, and should be avoided.

I am surprised about

According to documentation the index symbol:1,end_dt:1 should support symbol:1,end_dt:-1 as written

For example, an index key pattern { a: 1, b: -1 } can support a sort on { a: 1, b: -1 } and { a: -1, b: 1 }

It would be interesting to see the explain plan. By the ESR rule the original index should be used and the descending one should not make a difference.

I would try to $sort:{“symbol”:1 , “end_dt” : -1 } in the inner pipeline. The result should be the same we $match symbol.

I do not think that this has anything to do with the index or the inner pipeline. The aggregation returns the documents in a non specified order when you do not sort. If you want symbol in order you have to $sort. To have $sort use the index it has to done before $lookup.

system · June 22, 2022, 1:56am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.