Enhancement -- Duplicate Row Filter - Add additional conditions

Snowy · June 30, 2021, 3:45pm

Hi
I think it would be great if the duplicate row filter could take in multiple conditions, rather than just one when selecting which row to select from the duplicates. Allow a quick use case to illustrate my point…

If I have the following table that I want to de-dupe based on “Product ID”:

I can use the Duplicate Row Filter and select how I want the selection returned. E.g.:

Which would yield the highlighted row selected:

However, would it be possible to enhance the Duplicate Row Filter to allow another selection if duplicates still exist after defining a row selection.
For example… If I choose my row selection to be Maximum of Effective Date, there are two rows with the same maximum effective date. It would be great to then choose Maximum of Recent Sales from the remaining table. Which would yield this:

And ideally the user could continue to add conditions to select the right row from the duplicates.

takbb · June 30, 2021, 3:59pm

Hi @Snowy ,
If I understand your use case correctly, I believe we can achieve the same result that you are looking for by first sorting the table into the correct order of precedence that you are trying to achieve (sorting by multiple columns to achieve your level of refinement) and then simply using “First” as the row to keep in case of duplication.

I can see where you are coming from with this though, as specifying the precedence rules within the one node makes sense from a “documentation” point of view, and makes the intention explicit, but just wanted to check if this does at least achieve the result you are after.
BR

mlauber71 · June 30, 2021, 4:49pm

@Snowy I know that this suggestion is not strictly keeping it within KNIME but you could think about using H2 and window functions to use enhanced sorting and de-duplication: