Group rows by String Distance

Hi all. I’m new to Knime, hope someone could help in understanding some basic features.
I’m trying to group strings in a column within a table by evaluating the string distance. See image for further details. This is a very rough idea of my goal. The general concept is to define some rules in order to deduplicate dirty elements.
In this simple example the first column should represent all possible entries. They are not really already grouped, it’s only for sake of comprehension. They arrive as input randomly.
Does someone has a very basic workflow to understand the process? Something easyer than in https://www.knime.com/blog/address-deduplication .
Thanks in advance

immagine

Look here


for ideas.

2 Likes

Hi there @morelator,

welcome to KNIME Community Forum!

Here is a simple workflow example which can get you started:
https://kni.me/w/k7K1xR2zfglBPMQn

The thing that is left for you is to define rules in order to deduplicate dirty elements. You can use Rule Engine for this or GroupBy node if majority always wins…

You can check this topic as well:

Br,
Ivan

3 Likes

thanks it seems promising. I’ll test it in production as soon as possible. Thanks a lot

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.