These rows belong together but because it is written in different ways, it is not possible to bring them together. i did some preprocessing but there are still Problems like the attached Picture. the Right column Shows the occurrence, here i used the Group by node to know how often one Name occurs.
The problem is just that I dont have the Right Name, so I have no way to allocate these names to one right name.
Do you know with which nodes can handle that Problem?
For the current example you can use a “String Manipulation” node in which you apply this expression:
regexReplace(removeChars($column1$, " "), "thyssenkruppelectricalsteelindia.*", "thyssenkrupp electrical steel india private")
This expression removes the spaces and then replace the similar strings with the complete form.
However I don’t think this is a good general solution as there may be some other (different) strings as well. In that case you can use a “Column Expressions” node with some expression like below:
tempVar = removeChars(column("column1"), " ") if (regexMatcher(tempVar, "thyssenkruppelectricalsteelindia.*")) "thyssenkrupp electrical steel india private" else if (regexMatcher(tempVar, "someOtherString.*")) "some Other String"
As you can see, you can add more conditions for different strings after each “else if”.
By doing this you have the same strings to which the aggregation function you want can be applied (e.g. count).