de-deplucating words in a string

Hi,
https://forum.knime.com/t/de-duplicating-words-substrings-in-a-string/6298
I’m having an issue with de-deplucating words in a string. I’ve tried this recommended code which was given in the topic above:
regexReplace($column_name$, “(?i)\b([a-z]+)\b(?:\s+\1\b)+”, “$1”)
but it did not work.

I have a string like: Cat,Dog &Cat,Dog &Cat,Dog &Cat,Dog,Cat, Cat, Mouse,Dog,Dog,Bird,Bird
I would expect: Cat,Dog &Cat,Dog,Cat,Mouse,Dog,Bird
But after I run the code it still shows: Cat,Dog &Cat,Dog &Cat,Dog &Cat,Dog,Cat, Cat, Mouse,Dog,Dog,Bird,Bird

Can anyone help me with this?

Thanks,
Winanda

Without rereading the previous thread, I would split your table row by a comma delimiter. This will give you a single row where each column contains a value, such as “Cat”, “Dog &Cat”, “Dog”, “Cat”, “Mouse”, etc.
Then I would use the pivoting node to transform your single row into a single column with all the values.
Finally, I would use the Groupby node, or the Duplicate Row Filter node to get rid of duplicates.You can then use the unpivoting node to place them back into individual columns if you want.

3 Likes

Hi @Winanda

I think this is possible. First use a Cell Splitter node and split the string. For the output choose: As a set (remove duplicates). Then convert the set to a string again with the Collection to String node. See this workflow deduplicate_string.knwf (13.9 KB)
Screenshot%20from%202019-11-09%2011-34-01
gr. Hans

3 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.