Sort out / Match 2 Colums with a library

I have 2 .xlsx files that I need to match. The first file is my “Library”. In this file I have words (references) that are related to a Department and Sub_Dep. In my second file I have a column with “Rapid_Text”. In this “Rapid_Text” I can find some of my references and thus assign them the corresponding code and department.
Example:

If it doesn´t find any reference, should give me “not Found ” or empty

Library.xlsx (14.8 KB)
To be sort.xlsx (284.8 KB)

I hope someone can help me!

Thanks!

Hello D_Valle,

hope my solution can work for you. The solution is divided into two parts and involves two techniques.

In the first part, after some data preparation (I got rid of symbols, umlauts, spaces repetitions and lower case letters), I used string distance to compute the most similar row inside the dictionary for each row inside the main dataset.
Then, in the second part, I tried to find a correspondence between the most similar string of the dictionary and the Rapid_text column of the main dataset.

Let me know if it can work for you!

RB

1 Like

hi @D_Valle

maybe this workflow can be helpful

this is (a subset of ) the result

KNIME_project.knwf (22.2 KB)

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.