Compare if two strings are equal

Hi everybody I want to know if two strings are equal in Java. Basically I have two columns with diferent strings and I want to know which records are not the same by appending other column.  

I already tried with this using the Java Snippet Tool, but all the cases are different while is not the really the case. 

if (c_AONE==c_ATWO) {
	out_DIFFERENCES = "IS EQUAL";
} else {
   out_DIFFERENCES = "IS DIFFERENT";
}

Other way is using the equals(), but the problem is that the string is different. 

Many thanks

There is Similarity Search. If you are looking for duplicates, then the Jaro-Winkler Distance function (see String Distances) is a good fit - of course there is also Levenshstein and co.

The multivariate way (i.e. if you are not only looking for the nearest neighbor but rather the nearest neigbors) is to analyze à la mano the string distance matrix itself or by clustering using the string distance.

Thank you I will try that then, but still I want to obtain a "black and white" answer 

COLUMN1 COLUMN2 REMARKS
DOG DOG EQUAL
CAT CATT DIFFERENT
FOX FOXX DIFFERENT
SHEEP SHEEP EQUAL

Any help will be appreciated

 

Column comparator is another one, it produces true or false instead of equal or different. My above suggestion take into account a degree of difference, therefore they are more complicated in use. Or you can use Rule Engine and put any output values you want.

If you want to use the Java Snippet node, this is how it works:

if (c_AONE.equals(c_ATWO)) {
    out_DIFFERENCES = "IS EQUAL";
} else {
   out_DIFFERENCES = "IS DIFFERENT";
}

Another ways is to use the Rule Engine node.

Best,
Marc

1 Like

Never use "=" on strings in Java! This will not work because it compares object references but not the actual string. Use "string".equals(s) instead.

1 Like

You can also customise the output of the column comparator

Thank you excellent!!

(Actually the single = is even worse. That ia assignment, it tries to assign the value on the right to the variable on the left. What Thorsten suggested is the way to go, though you can get false to seemingly equals strings too (there are non-breaking space characters for example, which look just like a regular space in most contexts): for Java Strings, you should use reference.equals(value) or in case reference can be missing/null, Object.equals(reference, value).)
 

==tests for reference equality (whether they are the same object).

.equals()tests for value equality (whether they are logically "equal").

If you want to test whether two strings have the same value you will probably want to use Objects.equals().

More about...Java String Comparison

Riyan

2 Likes

Hi,

I have exactly the same problem to tackle, but "column comparators" or "string similarity" does not do the work.

Specficially, I compare two string column which include city names. The problem existist when eg. we have "Frankfurt" and "Frankfurt am main". I would like this to be TRUE, meaning yes these two are equal. Is there a way to do in in "rule engine"?

 

 

Hi I was playing with some Java stuff, the problem is not 100% solved but I know is a great advance. 

Best Regards

 

Use n-gram, there n is a minimal overlap. I use 3. And filter on similarity column.