Nodes to bring rows together

anon33357744 · April 18, 2019, 6:15am

Hi all,

how can I handle that Problem:

These rows belong together but because it is written in different ways, it is not possible to bring them together. i did some preprocessing but there are still Problems like the attached Picture. the Right column Shows the occurrence, here i used the Group by node to know how often one Name occurs.

The problem is just that I dont have the Right Name, so I have no way to allocate these names to one right name.

Do you know with which nodes can handle that Problem?

Thanks and cheers,
Canan

armingrudd · April 18, 2019, 6:44am

Hi Canan,

For the current example you can use a “String Manipulation” node in which you apply this expression:

regexReplace(removeChars($column1$, " "), "thyssenkruppelectricalsteelindia.*", "thyssenkrupp electrical steel india private")

This expression removes the spaces and then replace the similar strings with the complete form.
However I don’t think this is a good general solution as there may be some other (different) strings as well. In that case you can use a “Column Expressions” node with some expression like below:

tempVar = removeChars(column("column1"), " ")
if (regexMatcher(tempVar, "thyssenkruppelectricalsteelindia.*")) "thyssenkrupp electrical steel india private"
else if (regexMatcher(tempVar, "someOtherString.*")) "some Other String"

As you can see, you can add more conditions for different strings after each “else if”.

By doing this you have the same strings to which the aggregation function you want can be applied (e.g. count).

Best,
Armin

ScottF · April 18, 2019, 1:08pm

Hi @anon33357744 -

This might be a case where fuzzy matching could help. Check out this workflow:

https://hub.knime.com/knime/workflows/Examples/08_Other_Analytics_Types/01_Text_Processing/09_Fuzzy_String_Matching*vZLbH1jBCR6FXmhR

anon33357744 · April 18, 2019, 1:36pm

Thank you @armingrudd,

I will try it out

Best,
Canan

armingrudd · April 18, 2019, 1:52pm

Thank you @ScottF for sharing this great example!
Easier and more reliable.

Cheers,
Armin

izaychik63 · April 18, 2019, 6:51pm