remove columns or attribute having more than 40 percent missing values

HI,

I want to remove all the atttibutes or colums which have missing values more than 40%. how to do it?

One idea is to use the GroupBy node.

Use every column of your interest twice: the first aggregation method is Count (don't forget to tick the missing box to include missing values), the second aggregation method is Missing value count. The rest is simple mathematics.

so i am using the gruoup by node twice, first one for count and second one for missing value count. is there any way to choose this aggregation method for all the columns, i needed to change the aggregation method for every column individually, which is time consuming.

when i got the total count and missing value count then what is the procedure to remove the columns having more than 40% missing value, its just mathematics but which node should be used for that?

Add all columns to GroupBy, mark all of them and right-click for choosing "count". Add all of them again, mark them and right-click for "Missing value count".

 

After that for the tricky part: handle the output tables in a pairwise manner, looping and filtering on the original column name, I guess. Within the loop you strip off the column header (for standardised formulas), compute the percentage and return it together with the original column name used for filtering.

 

Cheers

E

Alternative approach:

 

Two GroupBy nodes in parallel, one for total and one for missing count. Retain original column headrs and transpose the output, then join both on the row ID which used to be the column header. Then a single math node and you're good, much easier in fact. :)

 

E

sorry, i still cant do it..may be needed to be mentioned that i am newbie in Knime. can you please provide any workflow example of this?

I created a very simple workflow for you. I hope this will help you!

hi, 

thanks for the workflow. you used columns to reject in your workflow, which i can not find in my Knime. do i need to install any extensions for that?

Columns to Reject is a MetaNode (which I created for you). Double click on the node and you will see the magic behind. wink

got it, thanks :)