Only positive moving aggregation

Hi all

I have to deal with events and an over all status of a production machine. If everything works fine, I could use a moving aggregation node over a row-based rule engine provided status. (To make it easy begin-event = 1, wait-event = 0 and end-event = -1). Normally I only have 1‘s and 0‘s in my overall status, but for some unidentified reason, the machine is missing the begin-event and the overall status runs to -1.
Now I have no idea how I can do a cumulative sum with only positive values. If it runs to -1, it should stay at 0.

For example:
Event, Status, Cum Status
Begin, 1, 1
Wait, 0, 1
End, -1, 0
Wait, 0, 0
Begin, 1, 1
End, -1, 0
(Begin event is missing)
Wait, 0, 0
End, -1, -1 <— should be 0

This happens only once per month, but crashes the whole result.

I hope anybody has a good idea to deal with it. :wink:

Thanks and regards
Martin

1 Like

Hi @mabm , I’m not sure I understand the logic that you are explaining.

Based on the data, the -1 is the expected results. Why should it be 0 instead of -1?

The Cumulative Status is adding that Status at each line. At each line, it takes the Cumulative Status of the previous row and adds it to the current Status value (previous Cum Status + Status). So, if we were to look at the calculation at each line, it would be something like this:
Event, Status, Cum Status

Begin, 1, 0+1 = 1
Wait,  0, 1+0 = 1
End,  -1, 1-1 = 0
Wait,  0, 0+0 = 0
Begin, 1, 0+1 = 1
End,  -1, 1-1 = 0
Wait,  0, 0+0 = 0
End,  -1, 0-1 = -1

Hi @mabm , I just realized something…

Did you mean that is how you expected Knime to behave, or rather that’s the result that you are looking for?

I originally thought you meant the former, but the more I am reading this and adding the title with that, I’m now thinking that you meant the latter.

If that is the case, so yes, Knime will give -1 with a Moving Aggregation. If you want to add logic if the cumulative sum is -1 then it should be 0, you can’t really do it in the Moving Aggregation node, at least I don’t think so.

You would have to compute the cumulative sum line by line and apply the rule line by line.

I put something together for you that does this.

This is the results it produced:
image

This is what the workflow looks like:
image

And here’s the workflow: Only positive moving aggregation.knwf (18.2 KB)

I think @bruno29a provided you a good solution if I understood correctly.
Otherwise options like rule egine node come up in my mind.
br

Hi.

Thanks @bruno29a. I will give your idea a try tomorrow at work. It looks nice and very promising.
I still had headaches when thinking for a solution.

Regards
Martin

1 Like

Hi @mabm , were you able to test the workflow, and did you try it on your bigger dataset?

It finished a few minutes ago, well it’s pretty slow (350k rows), but it looks good. No more negativ values for the status column.

Thanks a lot for your help.

Hi @mabm , yes it is expected to be slow, as it’s reading 1 record at a time unfortunately, but it is guaranteed to work.

The only other way that could be faster, but more complex would be with recursion. The logic would be to recursively do the following:

  1. Set start_row = Row 0
  2. Start Loop
  3. Do Moving Aggregation starting from start_row
  4. Get Row number where the first <0 aggregation, let’s say Row n
  5. Update aggregation to 0 for Row n and set start_row = Row n
  6. Loop until no more <0 as aggregation

So, the logic would be that most rows would be processed at once. It does not need to process all lines individually, but only a few. The less there are cases of <0 the faster it will execute.

But as you can see, it involves quite some steps, and step 3 is not straight forward.

Hi @mabm ,

I took upon the challenge and implemented another version of it using Recursive Loop, which should run much faster. The workflow looks like this:

I have also increased the test data by copying the 8 rows, and pasting them back, so they are repeated 4 times for the test:

Of course, without any manipulation, the raw Moving Aggregation gives you the <0 like this:

After it goes through the loop and manipulation, it gives me this result:

which is the same result as running the same data set via the original method:

I have included both methods in the same workflow:

Here’s the workflow: Only positive moving aggregation.knwf (49.7 KB)

Try this one with your 350k. It should go much faster.

3 Likes

I ran the workflow with a test set. In this test set was no “error” and the status is not getting -1. In this case the min(LineID) is 0 and everything will be set to 0 or was it a implementation error?

On the whole set it is incredible fast. :wink: And is working like charm.

Hi @mabm , oh right, if there is no <0 it looks like it will not “qualify” any records lol.

Sorry about that, so I changed the Row Splitter to a Rule-Based Row Splitter, where I also added the condition for min(LineID) is 0. It’s all good now.

Here’s the updated workflow: Only positive moving aggregation.knwf (49.7 KB)

EDIT: Just to clarify, the updated workflow will work for both cases, that is if there are no <0 or if there are <0. Please test with your test set and with your whole set :slight_smile:

1 Like

Hi @bruno29a
Thanks a lot.
In the newest data I saw, that things can ran more worst than expected. :frowning:
The machine data is missing the begin more than one time in a row. Therefore the status is decreasing to -3 right now.
I will prepare a dataset to share, if it is ok for you.

Here we go, I uploaded test set.
Test.knwf (1.2 MB)

2 Likes

Hi @mabm

As @bruno29a, I always try to provide a solution purely based on KNIME nodes, without other languages coding. However, I have to say that in this case, a scripting node may be the most appropriate solution to go for. If I understood well the problem you want to solve is a 0-truncated cumulative sum, or in other words, whenever the cumulative sum becomes negative then it is truncated to zero.

Please find a possible solution based on the -Java Snippet (simple)- node:

20211019 Pikairos Only positive moving aggregation.knwf (2.4 MB)

Hope this helps.

Best

Ael

2 Likes

Hi @mabm , it does not matter how many missing “Begin” you have in a row (I’m assuming you mean consecutive), the workflow should still work the same way.

Looking at your workflow, there are no <0 (no -1, nor -2, nor -3). However, you are not using the latest version that I provided, so it is not taking into account the correction that I added, which is handling where min(LineID) is 0. This is an important part of the logic and it’s not just for the beginning of the whole dataset, but rather the beginning of each batch in the loop.

Without this, it does not know how to skip a loop properly, hence why you get 0 all the way after line 48375. If you use my latest version of the workflow, it corrects this issue. The main changes are these 2 nodes that you need to replace:
image

The most important one is the Row Splitter. For the Rule Engine, it’s a bit more optimized.

You also do not need the Integer Widget which simply contains the variable “last_cum_status”. This variable was only needed for Method 1.

I like @aworker ‘s approach, which is basically doing what my Method 1 was doing, but without needing to loop into 1 line at a time. I keep forgetting about the global variable. It’ up to you if you want to use pure Knime, or if you want to use scripting.

2 Likes

Hi @bruno29a

Thanks for your comment. I very much appreciate your different solutions and all the effort and pedagogy to implement them. I think they are great for people to learn how to implement an algorithm in general and discover/use new KNIME nodes in particular. I still find amazing how easy things to implement in other scripting languages are sometimes difficult to implement in KNIME and indeed, viceversa. But I guess, if everything were easy to implement in just a single language, we would be all talking the same language. N’est-ce pas :wink: lol?

Best regards,

Ael

3 Likes

Entièrement d’accord! :wink:

3 Likes

Hi @bruno29a
I implemented your solution and it works just fine.

Thank you.

4 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.