De-Duplication and Merging of Rows

Hi All,

I'm just starting to learn Knime.  I would like to use Knime to first identify the duplicate records in my table based on email address.  Then determine when record of the duplicate will be the surviving record using the created date of the records.  

The last step which I'm don't know how to do is create a merging logic.  The column "Count" tell me how many time an email exist in my table.  If it is 1, then it becsomes the surviving record that I will keep and the rest will be deleted.  Before deletion, I would like to merge Phone and Address to the Surviving record if and only if the surviving record phone or address is blank. I do not overwrite existing data on the surviving record.

For example, record 2 phone number should be 111-111-2222 and the address is 11 Second St. after the merging is completed.

Is there a node or a Java snippet that could help me do this?  Your help is appreciated.

-Palo

My Source Table
Record Id Email Phone Address Created Date Count Label
1 test1@gmail.com 111-111-1111 11 First St. 7/1/2016 1 Surviving
2 test2@gmail.com     7/22/2016 1 Surviving
3 test2@gmail.com   11 Second St. 7/20/2016 2 Duplicate
4 test2@gmail.com 111-111-2222   7/01/2016 3 Duplicate
5 test3@gmail.com   11 Third St 8/01/2016 1 Surviving
6 test3@gmail.com 111-111-1111   7/02/2016 2 Duplicate

 

Hi Palo,

You may wish to try a GroupBy, with groups set to "Email" and "Label", and aggregations set to

  • Maximum on "Count" (and possibly also on "Created Date")
  • Unique concatenate on "Phone" and "Address"

This should give you the desired result.

Cheers
E