Text Similarity Issue

I am trying to solve a data-clearing issue using text similarity node. Let me define my problem first, I have a list of locations (correct names); in another file, I have different columns, including the location, but the values are misspelt, and my problem is to replace the wrong values.
The workflow looks like:

Now the problem is : when I am trying to configure the text similarity node (using Levenshtein), the list of columns is empty (none of the columns from the locations are shown).

What I need to do?

Hi @besherh

I can’t tell from your image if you already have the vectors inside to compare. Maybe this link to the hub will help you. If you can’t use directly you can pick pieces out of it. Another option is to use the string matcher node.

Hope this helps,
Jason

1 Like

This link has close discussion.

1 Like

Hi @besherh and welcome to the KNIME Community Forum :slight_smile:

Here you can find a little example workflow on the KNIME Hub for the Similarity Search node, which you might find helpful.

In general there are a few things to keep in mind when using the node.

  • The top input port must include a column with the misspelled values.
  • The middle input port must include a column with the possible correct values.
  • The column with the misspelled values and the column wit the possible correct values must have the same column name.

I hope this helps you to get your workflow running.

If not, it would be great, if you could share your workflow with some example data so we can help you :slight_smile:

Cheers
Kathrin

3 Likes

Besides Palladian comes with the handy String Similarity Node that calculates various string similarity metrics between two strings, like n-gram overlap, Levenshtein, and Jaro-Winkler. Just string/text in and a similarity score out.

Palladian can be downloaded from this update site: https://download.nodepit.com/4.7

Example workflows and discussions are linked on NodePit.

Best regards,
Daniel