Use "contains" function in an "IF" statement in a Column Expression node

TigerCole · August 29, 2022, 4:01pm

Hi …

I have a data set with a column that contains restaurant chain brands and venues. It is compiled by area managers that sometimes include the brand name with the venue location with a “-” between the brand and the venue so I get a column that has values “brand - venue” instead of just “venue”.

I can fix it by using a “Row Splitter” node followed but a “Cell Splitter” and then a “Column Filter” which work perfectly well and is bulletproof, but I am starting to like using scripts in “Column Expression” nodes just because in KNIME I can and it is an opportunity to learn something at the same time.

If I use a “contains” function to test if the column row contains a “-” by itself it works perfectly and I get an additional column with true/false values, but when I include it in a loop the result is always “false”.

This is the “code” from the node…

if (contains(column("Venue"), "-" ) == "true") {
    "fix this"
}
else {
    "leave that"
}

Can this be done?

Thanks,

ArjenEX · August 29, 2022, 4:44pm

Hi @TigerCole

What loop are you using and how is it configured?

If it always returns false then the column Venue is not coming through as expected.

An (anonymized) workflow would help a lot here.

TigerCole · August 30, 2022, 7:35am

Hi @ArjenEX

I have attached an example workflow. As you can see there is no need for loop, I would like to do the job of 6 nodes in 1 node.

The top section of the workflow is how I might usually do this kind of split, but it is the bottom flow with only 1 node that I would like to explore.

Example - Column Expression to split column with CONTAINS.knwf (20.5 KB)

tC/.

ArjenEX · August 30, 2022, 9:34am

@TigerCole Thanks!

I see what you mean now. You can approach this in different ways. It depends a bit on how dynamic the Venue names are in terms of different nomenclatures that you could encounter.

An option is something like this:

if (contains(column("Venue"), "-" ) == true) {
    strip(substr(column("Venue"),indexOfChars(column("Venue"),"-") + 1))
}
else {
   column("Venue")
}

Which:

Checks if the Venue contains a dash like you have. Note: you had "true" with quotes but the contains functions returns a boolean so true is sufficient here.
Take a substring starting from the position in the string where the dash occurs. Added a +1 as offset to omit the dash itself. The substr(str, start) is created dynamically based on the indexOf.

With this, the outputs match:

If you want to approach it even more dynamically whereby multiple dashes cloud appear and you always want to pick up the last element, I would opt for an array in a Java Snippet.

Assume this dataset:

Within the Java Snippet, I use:

String[] vals=c_Venue.split("-");
String out_last = vals[vals.length -1 ];
out_Venue = out_last.strip();

This:

Creates an array based on each part of the string between the dashes.
Get the value from the last position of the array by deriving it from the length of the array.
Small cleanup with strip()

Final result:

See WF:
Column Expression or Java Snippet to split column with CONTAINS.knwf (56.7 KB)

Hope this provides some inspriation!

TigerCole · August 30, 2022, 9:46am

Thanks @ArjenEX …

Exactly what I was looking for. I do like the Java solution as well.

tC/.

system · September 6, 2022, 9:46am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.