Text Similarity Issue

besherh · January 9, 2023, 6:57pm

I am trying to solve a data-clearing issue using text similarity node. Let me define my problem first, I have a list of locations (correct names); in another file, I have different columns, including the location, but the values are misspelt, and my problem is to replace the wrong values.
The workflow looks like:

Now the problem is : when I am trying to configure the text similarity node (using Levenshtein), the list of columns is empty (none of the columns from the locations are shown).

What I need to do?

j_ochoada · January 9, 2023, 9:32pm

Hi @besherh

I can’t tell from your image if you already have the vectors inside to compare. Maybe this link to the hub will help you. If you can’t use directly you can pick pieces out of it. Another option is to use the string matcher node.

Hope this helps,
Jason

izaychik63 · January 10, 2023, 1:21am

This link has close discussion.

Kathrin · January 10, 2023, 10:58am

Hi @besherh and welcome to the KNIME Community Forum

Here you can find a little example workflow on the KNIME Hub for the Similarity Search node, which you might find helpful.

In general there are a few things to keep in mind when using the node.

The top input port must include a column with the misspelled values.
The middle input port must include a column with the possible correct values.
The column with the misspelled values and the column wit the possible correct values must have the same column name.

I hope this helps you to get your workflow running.

If not, it would be great, if you could share your workflow with some example data so we can help you

Cheers
Kathrin

danielesser · January 10, 2023, 11:59am

Besides Palladian comes with the handy String Similarity Node that calculates various string similarity metrics between two strings, like n-gram overlap, Levenshtein, and Jaro-Winkler. Just string/text in and a similarity score out.

Palladian can be downloaded from this update site: https://download.nodepit.com/4.7

Example workflows and discussions are linked on NodePit.

Best regards,
Daniel

system · April 10, 2023, 11:59am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.