Remove duplicate row

Hello everyone !

I need to remove duplicate row but keep only one row

See the table…17 equal row and i need to leave only one.

Hi @igorbono

That’s a job for the Duplicate Row Filter node.

5 Likes

Hi @ArjenEX

What condition do I need to set to leave only one line ?

This is correct. The most important setting is under the Options tab where you define which columns should be evaluated for duplication.

1 Like

Hi @ArjenEX

But all the data in the columns are the same, this is the difficulty, leave only one row.


It looks like you have a lot of columns included there. So there is something definitely different in each row otherwise the node would have filtered those :wink: Also check for “hidden” characters like trailing spaces, those count as well.

Hi,

Like a excel filter, you need to understand which column define the priority or means for deduplicate…

A few columns can do it, as “date” and “id” can be your keys to define the duplicated ones… you don’t need to pass all columns, otherwise, all columns will be considered to determinate confirmation of a duplicate. Is it more clear now?

Thanks,

Denis

1 Like

OK @ArjenEX, i will do a deeper analisys on my table.

Thank you !

Hi @denisfi, thank you for your support.

When dealing with such large tables, one trick I can recommend is using the Create Collection Column temporally and review that output.

Example:

It looks like columns a,b,c and d all have the value. But based on the collection string you can notice that c actually has an additional space. Based on where the list starts to be different, you can spot the column which is preventing the removal of the duplicates.

3 Likes

OK @ArjenEX, i will try this function. Thanks.