Duplicate Row Filter is not filtering out duplicate values and labels everything as unique.

zacharyseman · October 5, 2020, 1:00am

I am trying to remove duplicates by concatenating a csv and a text file. After the concatenation I apply a remove duplicate row filter node incase the workflow is ran twice on accident.
However the remove duplicates row filter node will not mark things as duplicates and only marks them as unique.

elsamuel · October 5, 2020, 3:22am

Hi @zacharyseman, welcome to the forum.

I am trying to remove duplicates by concatenating a csv and a text file.

How does concatenation remove duplicates?

However the remove duplicates row filter node will not mark things as duplicates and only marks them as unique.

Are there actually any duplicate rows present?

It would be helpful if you provided your workflow and accompanying data that demonstrates the problem you’ve been encountering. As it stands, I don’t think there are enough details for us to start thinking about what might be going on.

mlauber71 · October 5, 2020, 5:50am

You might benefit from this article and example workflow

Also if you try to de-duplicate string variables you do not have any blanks at the end or hidden characters that would seemingly equal strings different in the end.

zacharyseman · October 6, 2020, 12:07am

Hello @elsamuel thank you for your response.

I am not trying to remove duplicates via the concatenation node.
I apologize I use the concatenation node to concatenate the data together so its all in one table.

after the concatenate node I apply the remove duplicate row filter nodel.
I filter on all columns except the last one.

When I rerun the same process twice meaning I run the same two files and attempt to write the results twice it does not indicate that on the second run the same exact data is in fact a duplicate.

zacharyseman · October 6, 2020, 12:13am

Hello @mlauber71

I have attempted purely use the group by node to perform the de duplication, however if someone runs the data twice and i aggregate based on date it will pull the same date based on the variable date that controls the whole workflow.

The node duplicate row filter itself is not working.
I have a file that I have copied and pasted the same exact row multiple times and the duplicate row filter node marks all of them as unique.

zacharyseman · October 6, 2020, 12:50am

I have found the issue CSV reader was changing one column from string causing the compare to be off from dropping leading zeroes

system · October 13, 2020, 12:50am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.