Remove duplicate lines

Hi,

I have a dataset that contais 3 columns

Source Target Weight
1 2 2
1 3 1
3 2 2
2 1 2

These represent conections on network, but since the direction of the conection is irrelevant , I consider rows 1 and 4 duplicates, how can I find all duplicates and remove them from such a table ?

Thanks all in advance

Jorge

Hi,

implement a rule or a Java snippet to map the source and target column to a single identifier. Use the following rule:

if (source < target) {
    identifier = source + "-" + target;
} else {
    identifier = target + "-" + source;
}

So, for your table the following identifiers will be created:

1-2
1-3
2-3
1-2

Equal identifiers now denote duplicate rows per your definition. You can then use a grouper node to eliminate the duplicates.

Philipp

Hi, thanks for the promp awnser, that solution is fine for numbers, but my data also includes text .

But since the data was from a list I've created ids and used your code.

 

Thanks once again

JC

Hey Jorge,

you can connect a Column Aggregator node to your table. Here you choose your Source and Target columns as aggregation columns. As option you choose a List (sorted), that will work with strings and numbers. Now you can use the GroupBy node and use the newly created column as your group column. To keep your "Weight" column, use First as aggregation method.

Best,
Marc