Duplicate Row filter

Hi Team,

Hope you are well.

While using joiner(Left outer) I am getting duplicates so I used duplicate row filter to remove the duplicates but duplicates are not removed.
So I tried on different column combinations to remove but still, the issue is there either I am getting more or I am getting less no. of line items.
In excel 1 - Total no. of line items: 217803
In excel 2 - Total no.of line items:154774
After using Joiner (Left outer): 229606 (I want as per excel 1)
Joiner based on Doc. No.
I am getting duplicates
So, I used Duplicate Row Filter: 215109
On different column combinations different no. of line items.

Can you please suggest how to remove duplicates.

Thanks

look at the photo

Hi @chandrika_99 -

Would you be able to provide a simple example workflow with toy datasets - maybe a few dozen rows per file? In this case it would help to be able to see what your intended inputs and outputs are, and what the duplication pattern is like.

1 Like

I can add one more idea to use SQL and row_number if you have an ID that would indicate a duplicate entry.

SELECT * FROM 

( SELECT * 
, row_number() over (partition BY `ID` ORDER BY `date` DESC) AS `rank_id`
FROM `default`.`data1`  
) t1
-- only keep the entry that will be 
-- left on top of a group of identical v_name
WHERE `t1`.`rank_id` = 1
;


This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.