Rule-based row filter

Hello. Im working with a predictive model and I have some issues with the data. I want to filter all the rows with wrong data, I have like 85 columns so I tried to use de loop node. the rules were that only the values that are below to 200 and above 0 are include. the problem is that I dont know how to especify a column with a changing number( to works with the loop). please any comment will help.
image

Hi and welcome to the KNIME forum,

Let me see if I have understood your issue.
You have 85 numeric columns and you want to filter the rows to include those which have a value between 0 and 200 in all columns. Am I right?

Armin

1 Like

That’s correct. actually that interval is just for some of the columns, I have to change the limits to the other columns.

So the value limit is different for each column?

the limits between 0 and 200 its for almost 40 columns. if you help me with that it will be great.


In this example I have 5 columns but I want to check the limit on the first 4 columns so I exclude the 5th column.
Then I extract the column headers, transpose and create rules and the result for the rule-based row filter (dictionary) node.

If you need more info about the nodes, search their name at nodepit.com.
Here is the workflow:
row filter limit.knwf (24.7 KB)

Best,
Armin

Hi there,

What kind of predictive model are you building? This need this before or after the modelling part? If one row is one observation of your data what is the final data set you want to get? This way you will mix observations or I’m missing something?

Br,
Ivan

Thank you! its helps a lot.

2 Likes

Im trying to predict faults in a motor, is data preprocessing stage so is before the modelling part.
I moved the historical data 12 hrs and I made a classification with a desition tree in order to obtain in a simple way the variables that are responsible for the failure.

Hi there,

ok. Preprocessing stage then. But if there is a value higher than 200 or less than 0 you will remove whole row?

Br,
Ivan

yes, you recomend another thing?

Hi there!

That depends on your data and prediction model you are using.

If wrong value in one column (variable) means the whole row (observation) is invalid then removing that rows seems fine to me.

If one wrong value does not mean the whole observation is invalid then you might want to think what to do with those values because other values from that observation are correct and you might want to use them. From your post you have data related to motor. I guess from some sensors? So if value is below 0 or higher than 200 what does it actually mean? Depending on that you can convert those values to 0 and 200, or missing value or average of previous values…

Br,
Ivan

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.