Helo Knime community,
With column expression, I want to write a code, in such a away to convert all “?” and empty rows with 1
and rest with all 0s
Can anybody help me out, how to write the expression. thankyou
Helo Knime community,
With column expression, I want to write a code, in such a away to convert all “?” and empty rows with 1
and rest with all 0s
Can anybody help me out, how to write the expression. thankyou
Hi @Muhammad44 , is there any reason why you want/need to use the Column Expressions?
You can use the Rule Engine node to do this:
MISSING $PostalCode$ => 0
TRUE => 1
@bruno29a yeah because in rule engine, the code will be longer for 18 columns whereas in column expression, I have to copied the code with add expressions. I have write the code something like this, but doesnt work
if (column(“City”) == 1)
{
missingValue()
}
else
{
column(“City”)
}
@bruno29a there are 18 columns so I want to do it for all columns, and there are rows in “address and city column” which doesnt have ? but are empty, so want to change for that as well.
Could you attach a sample of your data? I’m not sure how you have blank value (are those spaces?). Is so, perhaps transform all missing values to spaces (blank values) and then replace all blank values with 0. Then replace all non-0 with 1. I’m currently investigating the most efficient way to do this while I await your sample. Thanks
Hi @Muhammad44 , the Column Expressions would not help for the fact that you have 18 columns to process. You would still need to create 1 expression for each column.
Your best bet is probably to use a Loop like what @victor_palacios is suggesting, or without a loop, using the String Manipulation (Multi Column) and the Missing Value node.
Also, I noticed in the title that you actually want the missing values to be replaced with 1, not 0. So actually my original expression should have been the other way:
MISSING $PostalCode$ => 1
TRUE => 0
So, here’s a solution that handles multiple columns and without loop:
Input data:
Results:
I processed the columns Address, City, Country, PostalCode and County. The empty cells of these columns have been replaced by 1, and the non-empty ones by 0 as per your title.
Just explaining a bit what I did:
I highlighted 4 items to look at in the String Manipulation (Multi Column):
toInt(substr(join("0",$$CURRENTCOLUMN$$), 0, 1))
(more on this further down)When you uncheck the box “Insert missing values as null”, any join() that you do with an empty cell will result in an empty result, so based on this, I join “0” to all the cells of the selected columns, and the “0” is added at the beginning. This is what this part of the expression does join("0",$$CURRENTCOLUMN$$)
, and this is what the modified data would look like:
The empty cells remain empty, and the non-empty cells now have “0” as first character. Similarly, doing a substr()
on an empty cell will result in an empty result, meaning that I can “safely” extract the first character of all cells using substr()
, which will extract the “0” of the non-empty cells, and keep the empty cells empty. This is what this expression does substr(join("0",$$CURRENTCOLUMN$$), 0, 1)
Now, you can see that I also converted the results to Int (Integer) with the toInt()
function. The reason for this is to be able to use the Missing Value node in a more general way, where I can make the node apply the change to all Int column.
This is what the data looks like before doing the Missing Value:
I can now apply the replacement of missing values with “1” by applying the modification to all Int columns like this:
Of course, you also have the option of choosing each column in the Missing Value node, in which case you would not need to convert the columns to Int:
This way, of course, takes a bit longer to set up.
Here’s the workflow: Replace missing values with 1 and non-missing with 0.knwf (9.3 KB)
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.