Filter to group

Hello everyone,
I state that I am a beginner, I expose my requirement / problem.

I have a table with 20 columns containing nuneric information and one final column, which is noninal, and can contain the values Green or Red.

I would like to search for the optimal filter that groups the highest value for Green and lowest for Red.

For example if the twble contains 100 rows a good result could be:

60 green
15 red

Is there any way to get it?

Thank you all.
Umberto

Welcome to the forum, @umbs

I’m not sure I understand the logic here.

60 green
15 red

What are these numbers?

Hi,
Number of rows grouped by last column.

Hi @umbs
Am I right that you are looking for the largest group of consecutive rows of either green or red?

Something like this:
Create a sequence of consecutive greens or reds
image

And then find the max values within the entire dataset?
image

In general, try to be as complete as possible in your questioning. Include current input, expected output, screenshots, data sets, workflows, etc. as much as possible in the beginning. The more you provide, the more accurate the help will be :wink:

Hi,
I’m looking for largest group with Result Green and lesser group with Result Red (not consecutive).

INPUT:

OUTPUT (best result)
image

I would like to find a filter that would group by Result all the records including the greatest number of Green rows and the least number of Red rows using in the condition as many columns as possible, the “distance” between the two groups must be high.

Hope is clear :slight_smile:

Thank you.
Umberto

Hello @umbs, welcome to the community
Am I correct to assume that your requirement is more towards ML clustering ? [binary classification]
Also it seems you also need to find a function or technique like SVM, regression etc to label your result as GREEN or RED using the 18 columns.
And when you speak of distance, how is it supposed to be calculated ?
Though if you just need the count of GREEN and RED (after getting the best result), you need to use group by node on result column with count aggregation :sweat_smile:

To be honest, I don’t see any column that can be used as group based on your screenshot. If you have a proper group column somewhere which designates this, the GroupBy Node is a proper fit.

If you’re just looking for a color count, like in your screenshot, I would use the Value Counter node.

image

image

Yes, in depth this is a backtest. I would like to found a good filter that use much columns in order to have many Green rows.

I don’t see any group and (probably) don’t understand the goal correctly but you could treat it as a binary classification problem, map colors to 1 and zero and then train a ml model on it. This can give you the importances of each column contributing to the color.
br