Help with medical classification problem and missing values

onedd · November 10, 2020, 10:04am

Hi guys! I ask for your help in solving this classification problem, in detail given a dataset with medical information of a total of patients, a model must be trained that makes the prediction and tells if the patient has Alzheimer’s disease, has a depressive disorder, have mild cognitive impairment or have nothing at all. The problem is that there are missing values in the training set that I can’t resolve. Can you tell me how to handle these missing values and which classifier is best to use? I am attaching the files to the following drive: https://drive.google.com/drive/folders/1CARzKHfpWTFg9a3c-q8TDpfJ0QCIWXab?usp=sharing.
Thanks

elsamuel · November 10, 2020, 3:26pm

What have you done so far? This post reads as if you want people here to do all of your work for you.

No one here can tell you how to handle your missing values. This is something you need to work out on your own based on your specific goals. There are dozens of posts on this forum about approaches to handling missing values. Have you read through them?

Similarly, no one here can tell you which classification model is “best”. Again, this is something you need to work out on your own. You need to understand how each works and what its advantages and disadvantages are. You’ll need to build the models, evaluate their performance, and select the one that has the most predictive power in your specific context. This is a standard data science workflow.

There are videos on the KNIME YouTube channel with tutorials and examples of classification models being built and evaluated. The KNIME Hub has example workflows of the same. There’s a book with case examples. There are example workflows you have access to in KNIME.

onedd · November 10, 2020, 4:07pm

You are right and I have been very vague. I’ve decided how to handle the missing data. In detail for each test of a pathology I have 4 columns (PG, PC, PE, RESULT) that show the results. Missing values are possible in all columns, the main information needed for the prediction is the RESULT value, if this is missing you can use PE and the same for PC and PG. I would therefore like to write these conditions in knime so that I can better manage the missing values but I have not found any way either online and in the book to create these conditions (it is as if it were an if-then-else loop). Is it possible to manipulate the dataset in this way before making the prediction? if so how?

Thanks for your answer

onedd · November 10, 2020, 4:12pm

i am trying to create a template that uses rule based row filter but i have not come to any solution

mlauber71 · November 10, 2020, 4:13pm

If you want to manipulate your data the Rule Engine would give you a simplified way to do this. Also, you could use the Java Snippet(s) for if-then-else (1|2) or more complicated operations.

Since your questions seem to be quite basic you might benefit from one or more approaches to learn about KNIME and its possibilities. I have compiled a list to that regard:

If you are interested in Machine Learning there is another met list for that:

Daniel_Weikert · November 10, 2020, 7:32pm

If the dataset is big enough you can drop the missing rows. If not you could impute them (using mean or median.
There is a missing value node in KNIME which can help you easily do this. However keep in mind that based on your problem a missing value could also be an interesting feature with predictive capabilities for your output
BR

system · May 13, 2021, 9:18pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.