Filter Duplications based on a combination of Rows from Different Columns

Hi All,

I’ve been knocking my head against the wall trying to figure this out. How can I filter duplicates from a dataset when a row has matching fields with other rows in the dataset.

For instance, I have an ID that has firstname, lastname, email, phone. I want to compare the output of those fields with the rest of the dataset to check if there is a duplication. Table below:

ID. FirstName. LastName. Email. Phone

  1. Joe Frank joe@email.com 9999999999
    2 Aaron Smith smith@email.com 7777777777
  2. Joe Frank Frank@email.com 9999999999

The logic of the filter would be something like ‘FirstName’ = ‘FirstName’ AND ‘LastName’ = ‘LastName’ AND ‘Email’ = ‘Email’ OR ‘FirstName’ = ‘FirstName’ AND ‘LastName’ = ‘LastName’ AND ‘Phone’ = ‘Phone’

I need this because the ID does not match in the system, so have to go off of the person’s email address, to make sure the duplicate in the dataset is hightlighted, and ultimately filtered out.

Thoughts?

HI,
The node Duplicate row filter does the job described .
Or You can also group by the fields you want , the result will be a unduplicated table

1 Like

@rwhite923 this could new a job for a SQL window function in a local H2 database. But you would have to come up with a logic how to decide which mail address is the correct one in this case.