Hi guys,
from you experience, which algorithms work best with missing values regarding a classification task?
Is there also a node that allows to remove columns that are filled with less than 10 percent data?
Kind regards,
Canan
Hi guys,
from you experience, which algorithms work best with missing values regarding a classification task?
Is there also a node that allows to remove columns that are filled with less than 10 percent data?
Kind regards,
Canan
Hi @anon33357744 -
Your first question is going to depend on both your data, and what you’re trying to do. In some cases it may best to approximate missing values using interpolation or some sort of averaging (for numeric data), while in others simply using the next or previous value may be good enough (e.g., for some simple time series). The Missing Value node can help with implementing these strategies.
Your second question is easier to answer - just use the Missing Value Column Filter node
HI Scott,
thank you very much
I want to create a classifier that helps me to find transactions that don’t match the known patterns. I have to teach these patterns to the classifier first.
I’ve got five columns: Transactionid, Beneficiary, applicant, Port of origin and port of destination.
In some records, for example, the port of origin column is empty, it is not so that all records are filled with all information. can my training model then handle such data?
Thanks and kind regards,
Canan
Some algorithms (like decision trees) can handle missing values - sort of - by treating such values as their own class. This isn’t ideal, but maybe in your case it’s good enough. Other algorithms don’t handle missings well at all, so you are forced to impute.
Another strategy here might be to predict the missing values in your dataset using the other values that aren’t missing, but then you are making some assumptions about the nature of the data. Since you are trying to build a classifier to identify unusual patterns, this might not be what you want.
Then again, imputation and averaging beforehand makes certain assumptions too, so no strategy is perfect.
The very unsatisfying answer to “how should I handle missings” is “it depends”.
There was a thread discussing the use of the R package “Amelia” to impute missing values.
Thank you very much
Thanks
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.