Nodes to bring rows together

Hi all,

how can I handle that Problem:

These rows belong together but because it is written in different ways, it is not possible to bring them together. i did some preprocessing but there are still Problems like the attached Picture. the Right column Shows the occurrence, here i used the Group by node to know how often one Name occurs.

The problem is just that I dont have the Right Name, so I have no way to allocate these names to one right name.

Do you know with which nodes can handle that Problem?

Thanks and cheers,
Canan

1 Like

Hi Canan,

For the current example you can use a “String Manipulation” node in which you apply this expression:

regexReplace(removeChars($column1$, " "), "thyssenkruppelectricalsteelindia.*", "thyssenkrupp electrical steel india private")

This expression removes the spaces and then replace the similar strings with the complete form.
However I don’t think this is a good general solution as there may be some other (different) strings as well. In that case you can use a “Column Expressions” node with some expression like below:

tempVar = removeChars(column("column1"), " ")
if (regexMatcher(tempVar, "thyssenkruppelectricalsteelindia.*")) "thyssenkrupp electrical steel india private"
else if (regexMatcher(tempVar, "someOtherString.*")) "some Other String"

As you can see, you can add more conditions for different strings after each “else if”.

By doing this you have the same strings to which the aggregation function you want can be applied (e.g. count).

Best,
Armin

3 Likes

Hi @anon33357744 -

This might be a case where fuzzy matching could help. Check out this workflow:

https://hub.knime.com/knime/workflows/Examples/08_Other_Analytics_Types/01_Text_Processing/09_Fuzzy_String_Matching*vZLbH1jBCR6FXmhR

2 Likes

Thank you @armingrudd,

I will try it out :slight_smile:

Best,
Canan

Thank you @ScottF for sharing this great example! :heart_eyes:
Easier and more reliable.:+1:

Cheers,
Armin

2 Likes

See also great solution here:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.