REMOVE DUPLICATE AND GET CORRECT DATA IN KNIME

Hi @trafalgarlaw , I’ve been watching the thread but not had much time to devote to it. I’m trying to determine if the different cases that you have fit any simple patterns, but at the same time consider whether patterns would still hold if you had additional rows in any of the scenarios.

At the momentI’m tending to think that a Group Loop (grouped on IDENTIFIER) with an inner Recursive Loop to attempt to reduce down the group to a single row, based on the application of a set of rules might be the way forward.

I’d favour the type of approach in the direction I think @mlauber71 is headed, involving ranking and sorting to determine the “winner” in each case, but I haven’t got there with this one yet, as the different rows in each group need to be played off against each other.

The basic “rules” appear to be:

  1. If there is a D and an R of the same Identifier and Code in the same month, then the earliest D and R for that month for that Identifier and Code are both removed

  2. If there is a D and an R with the same Identifier and Code in different months then the earliest D and R for that Identifier and Code are both removed.

I believe that these two rules if recursively executed in that order until no further rows can be removed will resolve all of the cases you have listed. I’ve yet to see if that’s right though… so still a work in progress :wink:

2 Likes