Group rows by String Distance

morelator · August 12, 2019, 7:04am

Hi all. I’m new to Knime, hope someone could help in understanding some basic features.
I’m trying to group strings in a column within a table by evaluating the string distance. See image for further details. This is a very rough idea of my goal. The general concept is to define some rules in order to deduplicate dirty elements.
In this simple example the first column should represent all possible entries. They are not really already grouped, it’s only for sake of comprehension. They arrive as input randomly.
Does someone has a very basic workflow to understand the process? Something easyer than in https://www.knime.com/blog/address-deduplication .
Thanks in advance

immagine

izaychik63 · August 12, 2019, 12:37pm

Look here

for ideas.

ipazin · August 12, 2019, 12:48pm

Hi there @morelator,

welcome to KNIME Community Forum!

Here is a simple workflow example which can get you started:

The thing that is left for you is to define rules in order to deduplicate dirty elements. You can use Rule Engine for this or GroupBy node if majority always wins…

You can check this topic as well:

Br,
Ivan

morelator · August 12, 2019, 1:41pm

thanks it seems promising. I’ll test it in production as soon as possible. Thanks a lot

system · August 19, 2019, 1:47pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.