Hello,
I use this dataset i have several string columns to transform.
train.csv (223.8 KB)
Example Surgery Yes/No → 1/0
I will have to do the same for all the string column
Result is “?”
Where is the problem ?
Thanks
Hello,
I use this dataset i have several string columns to transform.
train.csv (223.8 KB)
Example Surgery Yes/No → 1/0
I will have to do the same for all the string column
Thanks
Hi @Brain
Three issues:
You are using LIKE which, as per the explanation, looks for a wildcard pattern. Something that you don’t have
You don’t have a fallback default outcome (as per the explanation notes).
Your rule has capital Y, your data is all lower case.
Use:
$surgery$ = "yes" => 1
TRUE => 0
Are you sure you do not want to one hot encode instead or do your features have an order?
br
Hi @Brain , for this use case I would suggest String Manipulation (multi column) using the following syntax. (This is my go-to “multi-column Rule Engine” )
toInt(
$$CURRENTCOLUMN$$.equals("Yes")?1:0
)
Make sure to select only the required String columns, otherwise for some data types the node produces nonsensical (to the user) error messages. String Manipulation cannot cope with being presented with Longs, Binary, Date and many other data types. Including such data types results in this message during configuration.
One day I’d like to see that particular annoyance fixed, since the non-multi-column version of this node has no trouble handling them…
btw… multiple rules can be accommodated by nesting the conditions,
e.g.
toInt(
$$CURRENTCOLUMN$$.equals("Yes")?1:
$$CURRENTCOLUMN$$.equals("No")?0:
$$CURRENTCOLUMN$$.equals("Maybe")?2:-1
)
edit:
If there is a possibility that a column could containing missing value (?
) then it is important to handle this as the first condition to prevent an error:
toInt(
$$CURRENTCOLUMN$$==null?null:
$$CURRENTCOLUMN$$.equals("Yes")?1:
$$CURRENTCOLUMN$$.equals("No")?0:
$$CURRENTCOLUMN$$.equals("Maybe")?2:-1
)
Hello,
For transforming string columns like ‘Surgery’ to binary (1/0) in KNIME, you can use the ‘Rule Engine’ node with an expression like:
$Surgery$ MATCHES “yes” => 1
TRUE => 0
$Surgery$ MATCHES "yes"
: This condition checks if the ‘Surgery’ column contains the string ‘yes’.=> 1
: If the condition is true, it assigns the value 1.TRUE => 0
: If the condition is not true (i.e., ‘Surgery’ does not contain ‘yes’), it assigns the value 0.This expression will effectively convert ‘Surgery’ values of ‘yes’ to 1 and all other values to 0.
Hi @tqAkshay95 , keep in mind that MATCHES
is for matching regular expressions whereas a simple =
is more correctly used for matching exact strings. Whilst it would work in this instance, using MATCHES can lead to unexpected outcomes if your string unintentionally contains regular expression special characters such as “.”.
Also one issue the OP was trying to overcome here was working across a large number of columns, which is something Rule Engine is not well suited for, but you are correct that for individual columns, Rule Engine can work effectively.
ChatGPT response Should be flagged as per Scott’s instruction.
Thanks for the correction.
I’ll be mindful of using = for exact string matching and MATCHES for regular expressions.
Also noted about Rule Engine limitations with a large number of columns.
Hi
This is the Rule workflow. My problem is each column is not only Yes or No.
This is the result i would like to have using only one node.
RULE.knwf (93.5 KB)
train.csv (223.8 KB)
Thanks
Br
If the data is not ordinal you could try one hot encoding with one to many
It’s what i look for.
Then try the node (One to Many Node)
br
This one is more luke-warm encoding but here is a new component I’ve been playing with:
You give it a set of rules in a single table. I used Excel for this purpose as it was easier to edit. They use exact Rule Engine syntax, and I’ve copied and pasted from all of your rule engines:
The biggest limitation would be performance on very large data sets because of the amount of Column Appending it has to do, which can slow things down significantly, especially on a large number of columns, but for general use, it should work.
I would suggest editing rules in a regular Rule Engine to ensure they are syntactically correct and then copy/paste all together into an Excel spreadsheet to be read in by the workflow. They can of course be manually edited in a table editor (Table Creator
) or using a text editor and plain csv files, but I found Excel more convenient to use.
RULES using multi-column component.knwf (256.9 KB)
Hi,
Great, I tried it and it’s fantastic. What’s perfect is that it’s a problem we often encounter. THANK YOU.
your welcome @Brain . Do post a message (tagging me) if you encounter any issues with it, as it’s new and I’ve only done basic testing.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.