Hi, I am working on missing value imputation. There are two methods I want to test:
Divide my dataset into two subsets, one that has all instances with complete values, the other that has instances with incomplete values. Here I want to test whether appending the complete dataset with lets say two rows from the incomplete dataset, then impute and then append next two rows has any effect on the results.
To create subsets that have equal instances of missing values and implement imputation technique to see which one performs well.
Apart from the R suggestion, I would suggest trying out various combinations of KNIME’s Row Splitter and/or Rule-based Row Splitter nodes (depending on whether you are dealing with missings in single columns or multiple columns) with the Missing Value node. You may also need to include some looping or sampling nodes as well, depending on how complex you want to get.
Maybe try to cobble together a simple example, using workflows on the Hub to get started - and then when you run into roadblocks, share your workflow in progress with us here?