Time series find and analyse chunks

Hello,

I’ve built already a few workflows but with this problem I’m lost at the moment.
In the graph you can see a price vs. time.
I have just 2 columns - date and price

If the price value is ‘1’ or ‘nan’ then the system is not invested
If the price value shows a real price then the system is invested

I want to know from this data:

  • when was the system invested (buy date and sell date)
  • what was the buy price and what was the sell price
  • … this I need to know for every period where the price was not “1”

What I would like to have now is a simple table with start date and end date for each part when the system was invested:

row Start Date / End Date / Start Price / End Price

row0: 2021-01-03 / 2021-01-07 / 1.87 $ / 1.92 $
row1: 2021-02-24 / 2021-02-27 / …
row2: …

Any idea how can I dow this with a workflow?

1 Like

Hi @dev_python , welcome to the community!

The basis for the way I solved this (fingers-crossed it works!) was to work out what data was going to be useful, and what was “noise”. I put together a sample data set as follows:

I’m assuming that you can already get any “NaN” values to be represented as 1 in your input table, so this was my starting point.

So to find your “buy and sell” prices, we need to find the groupings and discard rows that don’t give us anything. I start by finding a way to “simplify” the meaning of the numbers. So any value of 1 in the price column tells me it is Not Invested, and any other value tells me it IS invested. So I created an “Invested” column containing a flag of 1 for invested and 0 for not invested. In this case I used integers rather than booleans but you could use whatever.
image

Having established whether a row was invested, I wanted to know if the rows either side of it were invested. We need this to establish start and end of a period.

If we number our rows (RowNum), and make sure they are in ascending date order, then we can use a Lag Column node to tell us the number of the previous row (RowNum - 1). Using a pair of Cell Replacers, we can join between RowNum and RowNum -1, once in one direction (for previous row) and then in the other direction (for next row) to return the Invested flag for the prev and next rows. A small tidy up was required to handle the FIRST and FINAL rows of the data set. It is assumed that in these cases the imaginary row prior to the first row would not be invested, and neither would the row AFTER the final row. This then ensures they are treated as a BUY and SELL respectively.

image
image

After that, we can then use a rule that determines if a particular invested row is the beginning, end or somewhere in the middle of the particular block. We aren’t interested in the ones in the middle so we want to discard them. Here then we want to keep only rows where it is invested and either the row before, or the row after wasn’t invested.

image

This should then result in us having only alternate rows which are start and end of investment blocks. (Except that I then realised there was an edge-case that if an investment were both bought and sold on the same date, this wouldn’t be the case, so I inserted a couple of nodes just prior to the discard that found any “isolated” investments and duplicated them so that they would form a start-end pair of same date and price.

image

The final part was then to group the rows into pairs. Using the rowindex was the simplest way I found to do this so the odd row number in each pair was using as a “block number”. This could then be grouped with a groupby on “Block” returning min, max dates, and first and last prices for each block.

image

image

I hope it works, or at least gives some ideas for your own data set. Attached is my workflow, and do let me know if you spot any errors, or need any further assistance

KNIME_time_series_chunks.knwf (39.6 KB)

6 Likes

Hi @takbb ,

really I don’t know what to say!

Not only the solution is so cool - but that you put in so much effort in explaining all the steps in detail is really awesome. I appreciate that so much!

Workflow works perfect with my data.

Hope I can give back something similar to the community when my skills improved :slight_smile:

Big thanks!

3 Likes

Hi @dev_python , you’re welcome. Glad it helped you.

If you have a moment, and you’re happy that it works, would you be able to mark the answer as the “Solution”, (hidden behind the three dots under the answer). That way others looking can see your problem has been resolved, and also it helps people if they have a similar query in future. thanks in advance! :slight_smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.