Counting the max of consecutive days in error

HI All,

Honest I thought that this would be easier

I am running a simulation and I am trying to evaluate the result.

The goal is to count the max of consecutive days in error

For exemple, if I was working with the table bellow would be 4 .( Errors from day 18 to 21)

|Day|Results|
|1|error|
|2|error|
|3|error|
|4|ok|
|5|ok|
|6|ok|
|7|ok|
|8|ok|
|9|ok|
|10|error|
|11|ok|
|12|ok|
|13|error|
|14|ok|
|15|error|
|16|error|
|17|ok|
|18|error|
|19|error|
|20|error|
|21|error|
|22|ok|
|23|error|
|24|error|
|25|ok|

Any help will be welcome.
thanks

It would make help easier if you could post a data file rather than a screenshot.

Hi @rfeigel , in this case, the data supplied by @Adrix isn’t actually a screenshot and can be copied/pasted, which is better than many “sample data” examples that get posted :wink:.

If you drop the following component onto a workflow you can paste all the text into its config and it should give you the table which reduces the effort needed.

Hope that helps.

3 Likes

HI Guys

Sorry , as mentioned by @takbb the data is in copy paste mode
I thought that for a simple set would be easier to copy and paste in excel and past again in a table in knime than have a file to download
But i forgot that data scientists and eng are alergic to Excel. :grinning:
saved a workflow with the sample data :

Thanks in advance for the help

1 Like

HI

Awesome tool indeed

2 Likes

Hi, you can:

1- Use the groupby node just for count the information from this column (error/ok);
2- use the row split node to separate the information and see from the output a total rows;
3- can use column expression to use a count formula.

All depends of the result that you need.

Tks,

Denis

@denisfi
Thanks for taking time to help
Not sure if i understood what you mean.
If i count I will get
image

And the relevant info will no longer available
Reading from the first row to the last we have
3 days in error
6 days ok
1 day in error
2 ok
1 day in error
…
…
4 days in error
…
…
So the max numbers of day that i was in error was 4 .

And that is the figure I would like to retrieve.

1 Like

Hi @Adrix,

The main requirement here is finding a way to identify the “groupings”. After that, it becomes a relatively trivial exercise to then count the members of each group, and find the maximum. The groupings in this case are identified by change of value in the Results column.

One possible solution borrows (steals! :wink: ) from the solution provided by @HansS from a “similar” scenario
which I have adapted to your problem.

A second possible solution uses a component that can handle this kind of requirement generically:

The attached workflow provides both, and they both return the following:
image
Counting the max of consecutive days in error - with or without component.knwf (45.8 KB)

Component is here:

8 Likes

I accommodated the first approach , worked very well
Thank You very much.

Since this topic returns frequently I’ll throw in another alternative in case people find this topic again. Added bonus is that it’s about 50% faster.

First step is to lag the column, second step is to use a Java Snippet that analyses all the values stored in a Hashmap.

Map map = new HashMap();

List pair = new ArrayList();
pair.add($Results$);
Integer i = (Integer) map.get(pair);


if (i == null || !$Results$.equals($Results(-1)$)) {
  i = new Integer(1);
} else {
  i = new Integer(i + 1);
}

map.put(pair, i);
return i;

Last step is also a groupBy to find the max() sequence.

3 Likes

@takbb 's great solution but with moving aggregation instead of missing value node works as well
br

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.