Solutions to "Just KNIME It!" Challenge 24 - Season 2

Hi Everyone :slight_smile:

For this challenge I have separated the email domain from the rest of the email address and used the -GroupBy- node to count the number of times each domain is present in the table. I have then used the -Cross Joiner- node in order to join every email with every other email.

Using the -String Similarity- node, I have calculated the Levenshtein similarity between every domain and every other domain and have removed rows with similarity equal to 1.

I have then used the -Rule Engine- node to tag the email address as FRAUDULENT or NOT FRAUDULENT based on the following rule:

If the similarity is > 0.7 AND the domain count of the email < the domain count of the comparison email then the email is FRAUDULENT. If not, the email is tagged as NOT FRAUDULENT

You can find my workflow on the hub:

Best wishes
Heather

7 Likes