Column value abbreviation

Prag_Mor · May 14, 2021, 9:10pm

Dear Knimer,

Please need help to get the node details in below format

Column	Expected result in Knime
swizterland-april-monday-america	Sw-Ap-Mo-Am

takbb · May 14, 2021, 9:38pm

Hi @Prag_Mor

There are a number of ways this could be achieved. Attached is a method that springs to mind.

KNIME_abbreviate.knwf (10.4 KB)

Prag_Mor · May 14, 2021, 10:12pm

Thanks for the solution. @takbb
But the below problem arises:

swizterland-april-monday-america Sw-Ap-Mo-Am
swizterland-april Sw-Ap-- (Expected value is Sw-Ap)

In column Combiner how to prevent hyphens to be added if there is no value.

Column	Expected result in Knime
swizterland-april-monday-america	Sw-Ap-Mo-Am
swizterland-april	Sw-Ap	Sw-Ap–

takbb · May 14, 2021, 10:15pm

Hi @Prag_Mor

I think maybe you are going to need to state all of your requirements … how many different variations of terms does this need to allow for?

takbb · May 14, 2021, 10:37pm

Maybe one of these solutions. It really depends on what other examples you have.KNIME_abbreviate 2.knwf (21.4 KB)

Prag_Mor · May 15, 2021, 12:08pm

Thank you @takbb
It worked

Daniel_Weikert · May 15, 2021, 2:45pm

For regex you could also try ^\w{2}|(?<=-)\w{2} and then a groupby because the extractor splits to multiple rows. The result is not capitalized however. Still fun challenge.

takbb · May 15, 2021, 4:27pm

Hi @Daniel_Weikert , is that using the Palladian Regex Extractor?

OK, yes. Nice regex and great idea. Building on that… what if we find a way to capitalize the “words” prior to the regex? I just discovered that String Manipulation’s capitalize function allows specification of a delimiter:

capitalize($$CURRENTCOLUMN$$,"-")

Then we apply your Regex, but modified slightly so we also take the leading hyphen along for the ride too.
^\w{2}|-\w{2}

(I got a bit lost using the extractor to split to rows, when I had more than one row as input data, as I then had a group of rows with no obvious (to me) way of assembling them back into their original row-groups. So I tried splitting to columns instead.

If we tell it to split to columns we get this:

Replacing missing values with zero-length (empty) strings, we can then combine them back with the Column Combiner. So that gets it down to 4 nodes from 5, without writing the logic “in code” (as I did in the three-node option2)

Thought it would be nice though to find away of avoiding having to deal with those annoying “missing values”. Doing the Regex extraction to a list (instead of to columns) looked promising because it automatically skipped missing values from the lists it generated

Only trouble was… finding a way to convert the collection back into a string! I thought that ought to be straightforward. Then I discovered what others had discovered before me… and it seemed to cause more trouble than I was trying to avoid!

My search led me here:

Which then led me here, with a node by Vernalis:

and that got it down to three nodes with no coded logic (albeit needing to use what appears to be a useful community node). Any other takers?

Yes I agree, a fun challenge!

KNIME_abbreviate 3.knwf (35.3 KB)

Daniel_Weikert · May 16, 2021, 9:38am

yes I used the palladian regex extractor.
br

system · November 14, 2021, 9:39pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.