Hello group,
Working with KNIME, and my dataset. I needed to make the comparison between existing references in 2 different columns.
What is my problem, and that’s why I ask for your help…
As an example, I have in column A has the reference AAA11122222BBB-333CCC and column B to compare is 0AA11121222BBBB333CC.
As you can see some character or some is changed, but both strings match at 80% and therefore I would like that for that case it appears as “match” between both.
Can someone help me to do the Workflow?
Thanks in advance
Hi
There is a string similary node. You might want to check it out. There should be some examples on the KNIME hub as well
br
2 Likes
Hello @26AngelG ,
To achieve this result, you can use both the String Similarity node and the Java Snippet node.
I used the Java Snippet node. In the Java Snippet node, you can write logic like
Comparation between 2 Column.knwf (73.8 KB)
"String valueA = c_column1;
String valueB = c_column2;
int dp = new int[valueA.length() + 1][valueB.length() + 1];
for (int i = 0; i <= valueA.length(); i++) {
dp[i][0] = i;
}
for (int j = 0; j <= valueB.length(); j++) {
dp[0][j] = j;
}
for (int i = 1; i <= valueA.length(); i++) {
for (int j = 1; j <= valueB.length(); j++) {
int cost = (valueA.charAt(i - 1) == valueB.charAt(j - 1)) ? 0 : 1;
dp[i][j] = Math.min(Math.min(dp[i - 1][j] + 1, dp[i][j - 1] + 1), dp[i - 1][j - 1] + cost);
}
}
int levenshteinDistance = dp[valueA.length()][valueB.length()];
double threshold = 0.8;
double maxLen = Math.max(valueA.length(), valueB.length());
double similarityScore = 1.0 - (double) levenshteinDistance / maxLen;
String matchStatus = (similarityScore >= threshold) ? “Match” : “No Match”;
out_new = matchStatus;".
output
2 Likes
Thanks @Daniel_Weikert !! I will check it
UFFF great help @tqAkshay95 !! You help me a lot… I will try it!! THANKS
1 Like
Sorry @Daniel_Weikert but I don´t search it the “String Similary” node in KNIME. I look up through internet and I saw that it is necessary to install a NodePit… I don´t know how I could do it.
Could you help me please? thanks again
It needs to be added to available update sites
There is an older blog which might be helpful
Hey Roberta,
great, thanks for the screenshots! I can see there by looking at the third screenshot, that NodePit is currently obviously not properly installed – otherwise it would show up here:
[image]
Could you again go to this step please: File → Install KNIME Extensions…
In the installation window, could you try to untick the following checkboxes:
Show only the latest versions of available software
Group items by category
Hide items that are already installed
Then, just to try a diffe…
br
1 Like
OK!! I cheked it but I don´t have the last KNIME version, it is possible that NodePit only run in last version? thanks
takbb
April 27, 2024, 6:42am
9
Which version of KNIME are you using @26AngelG ?
Sorry I didn´t see this post @takbb !!
I have this version, but I cannot (or I don´t know) so I can install the NodePit.
Take a look at this recent post.
Hello im having a problem with a joiner where one information is complete while the other was written by a person so it has been shorten but is kinda random, the info seen to be exact to a certain point but it always drops
example
RCV 100 X 150000 P NAT = RCV 100 X 150000 P NATURAL PVC 30 60
i would just use the string manipulator substr( , , ) but the problem is that they are kinda random so some time i will need more letter and some times i will need less
is there a way to use some kind of…
1 Like
@26AngelG ,
if you have got the solution, please mark it with a green tick.
2 Likes
system
Closed
July 12, 2024, 6:10am
13
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.