Filter ROW if COLUM1*0,8>COLUM2

MICHI3SESSION · December 2, 2021, 7:44pm

Hello,

I am here right now:
I got this dataset from patients and want to filter the groups of patients that have 80% people with a heart disease.

Explained:
So I first clustered patients with similar conditions with k-mean.
This looks like:
________ Heart Disease;
cluster_1 0_
cluster_1 1_
cluster_1 1_
cluster_1 1_
cluster_1 1_
cluster_2 0_
cluster_2 0_
cluster_2 1_
cluster_3 1_
…

I want to filter all the rows that have under 80% Heart Disease completely.

For that I counted 2 things:
count = how many times do I have cluster_x
count(HD1) = how many Heart Disease=1 do I have per cluster_x

So it looks now like this:
________ Heart Disease; count; count(HD1)
cluster_1 0_; 5; 4
cluster_1 1_; 5; 4
cluster_1 1_; 5; 4
cluster_1 1_; 5; 4
cluster_1 1_; 5; 4
cluster_2 0_; 3; 1
cluster_2 0_; 3; 1
cluster_2 1_; 3; 1
cluster_3 1_; 1; 1
…

And now i want to filter all rows where: “count”*0,8>“count(HD1)”

I already tried with Rule-based-Row Filter, but it does not let me use calculations.

Does somebody have a solution for that?daten.xlsx (66.0 KB)
heart_disease.knwf (38.6 KB)

knimediger · December 2, 2021, 9:00pm

@MICHI3SESSION ,

First of all: Welcome to the world of KNIME.
Lokking at you challenge the pivot node Pivoting — NodePit comes to my mind.
That will work in case you are “0” or a “1” to identify whether a patient is with HD or not.

At first step you need to double the second column and ensure that it’s type is integer. That’s because we do some simple math now.
So the table looks like

col1 | col2 | col3
C1 | 0 | 0
C1 1 | 1
...

Now apply the above mentiopned pivot node to the table and configure
Group by col1
Pivot by col 2
Manual aggregation sum(col3)
Manual aggregation count(col3)

The pivot will give you all required figures to check your limit of 80%

bruno29a · December 3, 2021, 1:43am

Hi @MICHI3SESSION , and welcome to the Knime Community.

Indeed, the Rule* nodes (Rule Engine, Rule-based Row Filter, etc) do not allow you to do manipulations/calculations. These manipulations/calculations have to be done beforehand.

So, if we go with how you are doing it, just use a Math Formula node before the Rule-based Row Filter where you can do your count*0.8 and save the results to a new column, let’s say count_80.

You can then apply the rule of count_80>count(HD1) in your Rule-based Row Filter. You can then also remove the count_80 column with the Column Filter.

gonhaddock · December 3, 2021, 11:10am

Hi @MICHI3SESSION and welcome to the KNIME community

I had a look at the data that you provided and in a conceptual example you can apply something similar to this workflow.
20211203_heart_disease.knwf (98.4 KB)

I tested the KNIME provided k-Means node, by applying 3 clusters in this example. The node only works with numeric categories.

I did a fast check in R to estimate the right number of Clusters applying the Elbow method; converting the text categories ‘as.factor’ and using the factor number in the method for the 11 independent variables… that’s why i selected to choose 3 clusters.

anyhow, no further analysis to check statistical significance has been done

BR

gonhaddock · December 11, 2021, 5:22pm

Hi @MICHI3SESSION
I am just putting some order for clean reviewed workflows in my KNIME Hub. This revision includes the Elbow Method R chart.

BR

system · June 12, 2022, 5:23am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.