I have some odd values in numeric columns like 9999, 9998, 9991. They mean missing or some odd value that is intentionally labeled as a very large numeric value for recognition. It seem to be the old fashion statistical data encoding practice. Is there a short logical way to replace these specific values with null or missingvalue? I do not want to convert the Numeric column to String, do String manipulation using toNull(), and then convert it back to Numeric, if possible. It would be nice to do this in Rule engine or in Math formula nodes. But there, how do we assign Null as a replacement for a numeric value in the expression?
Hi @dursundelen , the rule engine will make something missing if you don’t give it an outcome for a particular value, so if all of your special values in “Column1” are (for example) 9990 and above then a rule engine that says:
and set to replace Column1 would
There isn’t an easy way of doing a “not equals” in the rule engine, so if the values you want to replace are more varied and not easily found by all being “less than” or “greater than”, then one option might be two Rule Engines. The first would have rules matching all the values you want to remove, setting them to a single “specifial” value, which is then interpreted by the second Rule engine in the way described above,
e.g. First rule engine
Second rule engine
This would handle if you had a variety of “special values” that weren’t just in one range
Thank you for the quick and well detailed answer. I like it. In teh meantime, I tried the following (although it is a two step process).
Used a Rule engine node to replace the odd values with “” (empty string / nothing)
$Column1$ >= 9900 => “”
$Column1$ < 0 => “”
TRUE => $age$
Choosing append a column. will create a new String column. The replaced cells will have no values in them (but not Nulls either, not yet)
Used a String to Number node for the newly created column. That will convert the String column to a numeric column with empty cells converted to Null values.
Another trick that I sometimes use to insert missings into a numeric column is by using the Math Formula node. Starting with
The following Math Formula node config
will insert a missing value:
It would be nicer if there was an explicit way to insert missing values with the Rule Engine. Feature suggestion!
Thank you, Aswin,
In your expression, the value of “0/0” is interpreted as Null. Wow!
I think it is worth mentioning is that it is possible to combine a Math Formule node and a Rule Engine node:
Use the Math Formula node to create a column that consists of only missing values (again by using 0/0), and then use a Rule Engine to insert those missing values in the column of interest.
there is NOT function in Rule Engine one can use for “not equals”. So following expression with replace option should do the trick if I got it right:
NOT $column1$ >= 9900 AND NOT $column1$ < 0 => $column1$
@Aswin request noted on existing ticket. (Internal reference: AP-9716)
Lol… @ipazin, I can’t believe I didn’t spot that… All this time I’d been wrongly thinking the NOT was simply for handling boolean columns… rather than negating other predicates… so had completely overlooked what should be obvious. I clearly need some stronger coffee…
On a related topic from yesterday (where I also started off on completely the wrong track ) , there is a use of String Manipulation (multi column) which can also be used as a type of Rule Engine for similar use cases… (ie turning specific values to “missing” across a dataset)
not bad approach but seems a bit too much code in KNIME (String Manipulation) terms
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.