Hi, I am trying to create a rule on all columns so where a cell says “missing data” it changes it to “-999”. I figured how to do that per column on the rule engine, but how can I do that with all the columns in just one node?
Hi @areej31
You can do this to process every column one by one via the Column List Loop.
With use of the flowvariable “current columnname” the Missing Value node is configured.
missing_multiple_columns.knwf (23.1 KB)
gr. Hans
Hi @areej31 , if by “missing data” you mean that KNIME displays it as the ?, and it is actually missing data, then the Missing Value node can achieve this.
Edit: snap! @HansS … although maybe the loop is not needed if is to be applied to all missing values
its not actually a missing value - the data comes in with the cells written “missing values” in some cells. And this I just need to change for easier calculation.
Hi @areej31,
for further calculations you need your data as numeric colums, correct. But columns with entries like ‘missing values’ are by default shown as datatype string.
The easiest way to remove strings is converting the column into type numeric via using string to number node.
BR
Hi @areej31, is all of the other data in the strings actually numeric in format, in which case @morpheus’s suggestion would turn the non-numeric “missing data” values into missing values which could then be handled as per earlier suggestions.
If it isn’t, it might be better to upload some sample data as there are potentially different ways to achieve this, and also some nodes are more suitable than others depending on what other column data types you have.
I don’t think you’ll find a single node solution unless all columns are of the String datatype in which case String Manipulation (multi-column) could handle it with:
string(
$$CURRENTCOLUMN$$ !=null && $$CURRENTCOLUMN$$.equals("missing data")
? "-999"
: $$CURRENTCOLUMN$$
)
BUT if you have any other data types, additional nodes will be required to filter those, unless you know them in advance and configure the node manually for just the String columns.