Hi all. I’m new to Knime, hope someone could help in understanding some basic features.
I’m trying to group strings in a column within a table by evaluating the string distance. See image for further details. This is a very rough idea of my goal. The general concept is to define some rules in order to deduplicate dirty elements.
In this simple example the first column should represent all possible entries. They are not really already grouped, it’s only for sake of comprehension. They arrive as input randomly.
Does someone has a very basic workflow to understand the process? Something easyer than in https://www.knime.com/blog/address-deduplication .
Thanks in advance
Here is a simple workflow example which can get you started:
The thing that is left for you is to define rules in order to deduplicate dirty elements. You can use Rule Engine for this or GroupBy node if majority always wins…