Search and remove dublicate

Hey ■■■■!
Please help me with detected dublicate. Did I select the columns to check for duplicates correctly?


1 Like

Hi @Vira_Maykova , welcome to the KNIME community.

Can you explain what you are doing and what is or isn’t working, as this will help others to help you rather than having to guess.

Remember that we cannot see your data or know your intentions unless you enlighten us, so there is no way anybody can tell you if you are doing something wrong or right, from your post.

If you want to find duplicates, the Duplicate Row Filter is usually a good node for the job. This can be used to both find duplicates and also remove them. Have you tried using that? If not take a look and feel free to post back with more detail if you are still having problems.

3 Likes

I made a grouping by columns (Customer ID, Transaction Date and Product SKU) and calculated the aggregation of quantity for the ‘count’ to remove duplicates and I do not understand how to check whether I have removed all duplicates or maybe unnecessary ones? After using Group by with this grouping, the number of rows decreased.

Hi @Vira_Maykova , To remove duplicates I would definitely suggest using the Duplicate Row Filter node. That node can also be condfigured to report duplicate rows rather than remove them.

However if you have grouped by Customer ID, Transaction Date and Product SKU, then by definition, your data will now be unique on Customer ID, Transaction Date and Product SKU as it is now aggregated by that grouping. If previously there were duplicates then they will have been removed by this grouping and your row count will reduce.

Can you explain what you mean by “unnecessary ones”.

As mentioned above, without seeing your data and knowing what you are actually wanting as an output it is difficult to state what you should or shouldn’t be doing.

You may be interested in this post which explains how the Duplicate Row Filter can be used to find the duplicates, and then a row filter can be used to actually removed them…

2 Likes

2 posts were split to a new topic: Help with performing XYZ Analysis