Column Expression node error? string length

Hello

I have come across something odd and inconsistent with the column expression node.

I am aware this might be a user bug and not a KNIME bug :wink:

I have created a example workflow, for those who wish too, to download.

Objective: get a list of strings of varying length. make each string 10 characters long by either chopping the 11th character to the nth, or filling the start of the string with X(s).

Nodes:
Table creator holds the list of strings
Column Expression Node 6: removes spaces
Column Expression Node 9: adds a column called value AND counts the length of string.
Question 1: why must I reduce the length of string by 1 to get the correct length? (In some languages I would expect to add 1 to the length because the first count is 0, but I have never had to reduce it by 1 before)
Column Expression Node 10: removes 11th character to the nth OR fills the start of the string with X(s)
Column Expression Node 11: counts the length of string.
See Question 1
Question 2 If you look at the output table of node 11 you will find lengths of either 9 or 10 even though there are 10 characters in each string. How? Why?

example row 6: value = AjaxAmster len = 9 (there are 10 characters in AjaxAmster)
example row 108: value = FCBarcelon, len = 9 (there are 10 characters in FCBarcelon)
example row 5: value = XXXXASRoma, len = 10 (there are 10 characters in XXXXASRoma)

Just one more thing…

This was cobbled together to show an example of what I have found, I know the code within the column expression nodes are not ideal.

Frank Columbo

Hi @FrankColumbo

The issue here is related to hidden “junk” characters in your input. At face value, the strings look clean but there is something hidden in it.

Instead of just replacing the space, I opt for a Regex to remove anything that is not equal to a letter by the means of regexReplace(column("team"),"[^a-zA-Z]" ,"" )


Note: Expand the Regex with characters like a dash or 0-9 if you know those are within the list football clubs.

If I then run length(column("team")) , the Column Expression return the correct length.

Although it’s cobbled, for the sake of it depending on if you know the function: you can shortcut node10 by using

if (column("len") <= 10) {
    padLeft(column("team"),10,"X" )
} else {
    left(column("team"), 10)
}

It saves you a bunch of coding.

That will get you your equalized string of length 10.

See WF:
column expression string length operation.knwf (63.1 KB)

Hope this helps!

2 Likes

amazing, awesome, thank you!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.