How to select row with most recent date (unique_id has multiple occurences)

rutgerverhaar · June 24, 2021, 3:10pm

Hi All,

I have the below file:

As you can see, the unique_id appears twice in the screenshot as the rmad field has been updated. What I need is to have logic in place which will tell KNIME to keep the row with unique_id etc with the most recent rmad date. So in this case I would keep only the row with RMAD at 2021-06-15.

Would anyone know how to do this?

Kind regards,

Rutger

Daniel_Weikert · June 24, 2021, 3:21pm

you could use groupby node on the unique id and then choose max Rmad (could be that using string to datetime node could be helpful before the grouping)
br

rutgerverhaar · June 24, 2021, 3:35pm

Hi @Daniel_Weikert,

Thanks for the reply!

I converted RMAD to date&time, however when I perform the groupby the table structure changes and will only hold the aggregated columns I included in the config. What I need is for the table format to stay the same, but to only have the max date included.

Kind regards,

Rutger

rutgerverhaar · June 24, 2021, 3:39pm

I can obviously add all the fields in the groups settings… Thank for the help @Daniel_Weikert

Snowy · June 24, 2021, 4:25pm

The GroupBy certainly works, but the Duplicate Row Filter node also contains this functionality. E.g.:

If you’re ever presenting the workflow, non-tech users will probably understand more what’s happening with the duplicate row filter node rather than the groupby node.

ipazin · June 24, 2021, 4:42pm

Hello @rutgerverhaar,

as @Snowy pointed out this is exactly the use case for Duplicate Row Filter node

Br,
Ivan

system · July 1, 2021, 4:43pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.