Row filter, based on complex multicolumn criteria

Hi all, I have a problem again. I guess being a newbie doesn`t help. I would apritiate the help. The goal is to extract red circled data from column A and the criteria is as it follows.

  1. UPPER BOUND
    Column B has either zeros or some values, think of it binary, I circled in blue part with some values. The script should look for non zero value in column B (circled in green) and when it finds it look at the nearest zero in column C and go one cell down. That would give the upper counter bound of 101482.

  2. LOWER BOUND

Once the column C starts getting non zero values, the script should go down the column C until it reaches the first zero value, then look at the counter and go up for 72 count values. That would give the lower counter bound of 101 500. So, my result would be column A values between counter values of 101482 and 101 500. There are several such cases that would apear in the column A, not just a single event. Not sure I have been clear?

Can you upload some data for us to work with?

OK, so here is a bigger picture which maybe brings light to things, in case I went into wrong direction when trying to solve the puzzle. Attached find my sketch of what I need to do. I also added a table to illustrate how complex is the whole task. In this example I will describe the desired workflow for just a row no.1. from the table. So, here it goes:

  1. I need to slide window no1, 12 segments long along the „5min values“ column, and detect packs of 12 which in sum are above treshod value of 20.
  2. For each pack of 12 that meets the minimum sum criteria I need to start sliding windows no.2 and 3., 72 segments long and detect the nearest „upstream“ and „downstream“ packs of 72 which in sum equal zero.
  3. One row below the window 2 is the upper bound of the series that I need to extract
  4. One row above the window 3 is the lower bound of the series that I need to extract
  5. I will obviously have several series to be extracted
    In order to solve this I used „moving aggregation“ node and a „counter“ node and got written down the numbers of rows which meet the criteria that sum of 12 aggregated values are above the treshold of 20. Next I used „moving aggregation“ to pack 72 values together and check their sum. And then I got stuck.
    I am attaching excel file with my inputs („Datum automatika“ and „ 5 min values“) and outputs ( the rest of the columns).

Mind you, the source table has several billion rows, excel cant support it, so this is just a part.
And I have to do this whole table procedure for 510 files like this. Thanks for taking a time to look.

Example file is here

Hello, I couldn’t download the data from that link. For that reason, please attach (directly to this thread using the upload button) a sample file with perhaps 500 to 5000 rows in an excel file or csv file, etc. so that people don’t have to use 3rd party sites.

Next I see you used nodes, so please also directly attach your workflow (reset the workflow before upload so that we can’t access your private data if you have any).

This will get more responses from community members. I will also comment if I see any room to help.

Thank you~

2 Likes

Hi Victor, thank you for the tips. I will do something about it.

OK, here is the complete workflow I came up with so far. I am uploading it with the original real input .xlsx file. I had to change the format so that is what most of my workflow is dealing with. Original file is much smaller. But result at the last node represents the huge table which I uploaded using the 3rd party site.
proba_5 min_brisi.xlsx (115.6 KB)
Forum_multicriteria.knwf (62.3 KB)

For this workflow, I’ve removed Create Date&Time Range + Join node and instead used the Date&Time-based Row Filter node. This leads to results, so please check if those results are what you wanted.

Screen Shot 2022-07-11 at 5.53.20 PM

1 Like

Hi, thanks for trying. This would extract one single manually entered event, right? I am actually trying to automatically detect events and extract all of them at once, based on criteria.

I don’t understand how you identify events, but I think you are worried about automation which I think you can achieve with:

  1. Loops once you identify your events: How to Build a Generic Loop - YouTube
  2. Flow Variables to plug in dates after they are found by your method: Flow Variables - YouTube

The other suggestion is to the break your problem into very, very simple steps and we tackle only one step at a time. If you would like to try that, please tell me only your most basic and 1st goal to achieve.

1 Like

Victor, thank you for your kind advice.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.