detect and delete duplicate (rows) based on Lastest variable date and not on row ID

Dear all,
I’m looking for a solution how to deal with the duplicate rows. by deleting the row duplicate that has the old date

Example:
INPUT Table
Standards | link | update date
iso 155 | link1 | 2017-07
iso 155 | link1 | 2017-01
iso 100 | link2 | 2012-03
iso 100 | link2 | 2010-01
iso 199 | link3 | 2010-04

OutPUT need to be like that:

Standards | link | update date
iso 155 | link1 | 2017-07
iso 100 | link2 | 2012-03
iso 199 | link3 | 2010-04

Bests.

Hi @Mokrani
You can do this with the GroupBy node, add the columns that stay the same in the Group Column(s) table and use the aggregation method Maximum on the update date column.
best,
Gabriel

Could you send me an example to understand more? im quit new in knime?
Thanks

Here you go:

GroupBy demo.knwf (6.0 KB)

best,
Gabriel

1 Like

Thank you!!
But i have a problem with rows that contains different Links as an example

Example:
INPUT Table
Standards | link | update date
iso 155 | link1 | 2017-07
iso 155 | link1555| 2017-01
iso 100 | link2 | 2012-03
iso 100 | link2 | 2011-03

OutPUT need to be like that:

Standards | link | update date
iso 155 | link1 | 2017-07
iso 100 | link2 | 2012-03

but in your workflow I got
Standards | link | update date
iso 155 | link1 | 2017-07
iso 155 | link1555| 2017-01
iso 100 | link2 | 2012-03

Could you help about that?

I find a solution about that by adding aggregation get the last URL (because I notice that knime can sort automatically the string date )

Thanks again.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.