Fuzzy matching String and get the comparison and filter data which comparison is 97& or more show only those data

Hi Team,
i have a requirement to calculate Fuzzy logic as per below .

  1. Have list of vendors name in one column .
  2. Generating record ID against each vendor name.
  3. Then it compare Vendor Name one to another and create another column MatchScore and MatchScore_vendor_name
  4. if Vendor Name is matching 100 % one to another then MatchScore and MatchScore_vendor_name is 100% .
  5. After generating Matching score then filter the data to show which has only 97% or more MatchScore.

It would be a lot easier to help if you provide some sample data.

use similarity search node

Fuzzy_logic_data.xlsx (9.9 KB)

tried smilarity search but didnt worked out… PFA sample Data
Fuzzy_logic_data.xlsx (9.9 KB)

Any reference to implement would be appreciated… i have given input data and also what is the expected output we are suppose to get all are mentioned

Try this. Uses String Similarity node with Levenshtein similarity measure. To use any of the similarity nodes you have to compare two strings. I accomplished this by cross joining the vendor columns. You should study how Levenshtein similarity works to make sure it meets your needs.

1 Like

Thank you for the quick reply but here we have input as a Vendor Name only which you can use in your excel reader, remaining column has been derived from this input and i am having KNIME version 5.4.2 which doesn’t have string similarity node option to install

You need to install the Palladian extension. I don’t understand the rest of your message.

Hi rfeidel,

PFA input file which we need to use as a input and also i have attached expected output.
Fuzzy-output.xlsx (9.1 KB)
Fuzzy-Input.xlsx (9.2 KB)

Try this. It appears that you’re trying to compare the first and last eleven rows although you didn’t explain that. If not, I’m totally lost. I also can’t understand how you expect to produce the output file other than the similarities. There’s not enough information to produce the other columns. Finally, you’ll need to install the Palladian extension to use the String Similarity node.


1 Like

I am sorry in case of confusion , let me try this one and yes i am comparing vendor names one to another to see which vendor names are matching very closely.

One more time - are you comparing the first eleven rows to the last eleven rows or do you to want to compare every vendor name to every other vendor name? If the former, my second workflow should work. If the latter my first workflow does that.

1 Like

@mshahn02 if you want to group similar names from a single column you could try to adapt this example where you would not compare to a ground truth.

You also might want to formulate your request in a more detailed way as @rfeigel has suggested. If you do not want to do this in English you could try in a language you are familiar with and either then translate it with a current LLM like ChatGPT or use Deepl

@mshahn02 I built this workflow which tries to group the Vendor Names into groups without a ground truth

Maybe you can check if this suits your needs. You can manipulate the value for a match between 0 and 100:

1 Like