Hello all,
I recently came across this blog post from Knime about the new data dedupe possibilities:
http://www.knime.org/blog/address-deduplication
The example dataset already includes a "class column", which defines the relation between the duplicate records. But as I'm trying to dedupe my data, my biggest problem is to build this column, as the data by default doesn't have any matching fields that I could use to do the grouping on (and is exactly why it needs deduping in the first place).
Let's say for example my data has only string column with names of companies:
Company |
McDonalds |
Macdonalds |
Mc Donalds |
How can I create a class column for this?
Thanks in advance for any help!
-J