Hello group,
Working with KNIME, and my dataset. I needed to make the comparison between existing references in 2 different columns.
What is my problem, and that’s why I ask for your help…
As an example, I have in column A has the reference AAA11122222BBB-333CCC and column B to compare is 0AA11121222BBBB333CC.
As you can see some character or some is changed, but both strings match at 80% and therefore I would like that for that case it appears as “match” between both.
Can someone help me to do the Workflow?
Thanks in advance
Hi
There is a string similary node. You might want to check it out. There should be some examples on the KNIME hub as well
br
2 Likes
Hello @26AngelG ,
To achieve this result, you can use both the String Similarity node and the Java Snippet node.
I used the Java Snippet node. In the Java Snippet node, you can write logic like
Comparation between 2 Column.knwf (73.8 KB)
"String valueA = c_column1;
String valueB = c_column2;
int dp = new int[valueA.length() + 1][valueB.length() + 1];
for (int i = 0; i <= valueA.length(); i++) {
dp[i][0] = i;
}
for (int j = 0; j <= valueB.length(); j++) {
dp[0][j] = j;
}
for (int i = 1; i <= valueA.length(); i++) {
for (int j = 1; j <= valueB.length(); j++) {
int cost = (valueA.charAt(i - 1) == valueB.charAt(j - 1)) ? 0 : 1;
dp[i][j] = Math.min(Math.min(dp[i - 1][j] + 1, dp[i][j - 1] + 1), dp[i - 1][j - 1] + cost);
}
}
int levenshteinDistance = dp[valueA.length()][valueB.length()];
double threshold = 0.8;
double maxLen = Math.max(valueA.length(), valueB.length());
double similarityScore = 1.0 - (double) levenshteinDistance / maxLen;
String matchStatus = (similarityScore >= threshold) ? “Match” : “No Match”;
out_new = matchStatus;".
output
1 Like
Thanks @Daniel_Weikert !! I will check it
UFFF great help @tqAkshay95 !! You help me a lot… I will try it!! THANKS
1 Like
Sorry @Daniel_Weikert but I don´t search it the “String Similary” node in KNIME. I look up through internet and I saw that it is necessary to install a NodePit… I don´t know how I could do it.
Could you help me please? thanks again
It needs to be added to available update sites
There is an older blog which might be helpful
Hey Roberta,
great, thanks for the screenshots! I can see there by looking at the third screenshot, that NodePit is currently obviously not properly installed – otherwise it would show up here:
[image]
Could you again go to this step please: File → Install KNIME Extensions…
In the installation window, could you try to untick the following checkboxes:
Show only the latest versions of available software
Group items by category
Hide items that are already installed
Then, just to try a diffe…
br
1 Like
OK!! I cheked it but I don´t have the last KNIME version, it is possible that NodePit only run in last version? thanks
takbb
April 27, 2024, 6:42am
9
Which version of KNIME are you using @26AngelG ?