Filter Duplications based on a combination of Rows from Different Columns

rwhite923 · November 2, 2022, 10:05pm

Hi All,

I’ve been knocking my head against the wall trying to figure this out. How can I filter duplicates from a dataset when a row has matching fields with other rows in the dataset.

For instance, I have an ID that has firstname, lastname, email, phone. I want to compare the output of those fields with the rest of the dataset to check if there is a duplication. Table below:

ID. FirstName. LastName. Email. Phone

Joe Frank joe@email.com 9999999999
2 Aaron Smith smith@email.com 7777777777
Joe Frank Frank@email.com 9999999999

The logic of the filter would be something like ‘FirstName’ = ‘FirstName’ AND ‘LastName’ = ‘LastName’ AND ‘Email’ = ‘Email’ OR ‘FirstName’ = ‘FirstName’ AND ‘LastName’ = ‘LastName’ AND ‘Phone’ = ‘Phone’

I need this because the ID does not match in the system, so have to go off of the person’s email address, to make sure the duplicate in the dataset is hightlighted, and ultimately filtered out.

Thoughts?

Adrix · November 2, 2022, 10:24pm

HI,
The node Duplicate row filter does the job described .
Or You can also group by the fields you want , the result will be a unduplicated table

mlauber71 · November 4, 2022, 4:41am

@rwhite923 this could new a job for a SQL window function in a local H2 database. But you would have to come up with a logic how to decide which mail address is the correct one in this case.

system · February 2, 2023, 4:41am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.