Filter to group

umbs · August 10, 2022, 11:24am

Hello everyone,
I state that I am a beginner, I expose my requirement / problem.

I have a table with 20 columns containing nuneric information and one final column, which is noninal, and can contain the values Green or Red.

I would like to search for the optimal filter that groups the highest value for Green and lowest for Red.

For example if the twble contains 100 rows a good result could be:

60 green
15 red

Is there any way to get it?

Thank you all.
Umberto

elsamuel · August 10, 2022, 12:45pm

Welcome to the forum, @umbs

I’m not sure I understand the logic here.

60 green
15 red

What are these numbers?

umbs · August 10, 2022, 12:48pm

Hi,
Number of rows grouped by last column.

ArjenEX · August 10, 2022, 3:53pm

Hi @umbs
Am I right that you are looking for the largest group of consecutive rows of either green or red?

Something like this:
Create a sequence of consecutive greens or reds

And then find the max values within the entire dataset?

In general, try to be as complete as possible in your questioning. Include current input, expected output, screenshots, data sets, workflows, etc. as much as possible in the beginning. The more you provide, the more accurate the help will be

umbs · August 11, 2022, 6:00am

Hi,
I’m looking for largest group with Result Green and lesser group with Result Red (not consecutive).

INPUT:

OUTPUT (best result)

I would like to find a filter that would group by Result all the records including the greatest number of Green rows and the least number of Red rows using in the condition as many columns as possible, the “distance” between the two groups must be high.

Hope is clear

Thank you.
Umberto

berserkersap · August 11, 2022, 7:15am

Hello @umbs, welcome to the community
Am I correct to assume that your requirement is more towards ML clustering ? [binary classification]
Also it seems you also need to find a function or technique like SVM, regression etc to label your result as GREEN or RED using the 18 columns.
And when you speak of distance, how is it supposed to be calculated ?
Though if you just need the count of GREEN and RED (after getting the best result), you need to use group by node on result column with count aggregation

ArjenEX · August 11, 2022, 8:11am

To be honest, I don’t see any column that can be used as group based on your screenshot. If you have a proper group column somewhere which designates this, the GroupBy Node is a proper fit.

If you’re just looking for a color count, like in your screenshot, I would use the Value Counter node.

umbs · August 11, 2022, 8:21am

Yes, in depth this is a backtest. I would like to found a good filter that use much columns in order to have many Green rows.

Daniel_Weikert · August 11, 2022, 4:33pm

I don’t see any group and (probably) don’t understand the goal correctly but you could treat it as a binary classification problem, map colors to 1 and zero and then train a ml model on it. This can give you the importances of each column contributing to the color.
br

system · November 9, 2022, 4:34pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.