Create pairs from a list or set

mpreusse · January 20, 2022, 12:18pm

I have a Set column that contains a list of IDs:

Bildschirmfoto 2022-01-20 um 13.13.31

My goal is to create a table with all pairs of IDs in each set:

set1_id1   set1_id2
set1_id1   set1_id3
set2_id1   set2_id2
set2_id1   set2_id3
...
setN_idN setN_idN

Operation in pseudo code:

for row in table:
    for element in set:
        return (element, element+1)

Is there a way to achieve this with built in nodes or do I have to write a Java snippet? Would also be great if someone could share a few ideas how to do that in a snippet.

aworker · January 20, 2022, 12:30pm

Hi @mpreusse

Could you please post the data shown in your snapshot in text format ? I’ll take it from there and provide a solution.

Best

Ael

mpreusse · January 20, 2022, 12:46pm

Thank you @aworker!

Here is a CSV file that contains a comma seperated list of IDs. The first column contains an identifier for the list of IDs.

ids.txt (84.1 KB)

aworker · January 20, 2022, 1:02pm

Hi @mpreusse

My pleasure. Would this do the trick ?

For instance, from this list:

to this result:

20220120 Pikairos Create pairs from a list or set.knwf (1004.2 KB)

Hope it helps.

Best wishes

Ael

mpreusse · January 20, 2022, 1:29pm

Absolutely fantastic, thank you @aworker.

Step by step I learn to think in tables and joins

Is it possible to filter the output table for unique pairs? I.e. delete 5-23 if we have 23-5?

aworker · January 20, 2022, 1:52pm

Yes indeed, it is quite easy

The trick here is to filter out pairs of “doublets” based on Alphabetic order. Since they are all duplicated, one of the two pairs is in alphabetical inverse order and hence, you can filter out the one that is alphabetically “bigger”, for instance:

  11364        34	(is removed)
     34	    11364  (but this one is kept)

Then you need to make sure that only one instance of “self-pairs”, for instance 34 34 is kept too, because normally there should be two. This is done using the -Duplicate Row Filter- node.

20220120 Pikairos Create pairs from a list or set without doublets.knwf (3.0 MB)

Since you are working with gene sequence IDs, I guess your aim is to build a non-directed relational graph and for this you just need one edge between two related genes if the graph is not directed. But I’m just guessing or anticipating what maybe you want to eventually implement ?

Thanks for your kind comments and for having validated the answer !

Best wishes,

Ael

nxfxcom · January 20, 2022, 3:49pm

Thank you, the only challenge is my IDs are not numbers so I cant do the Rule based Row Filter on => any ideas?

mpreusse · January 20, 2022, 4:56pm

Thanks @aworker!

That really is a beautiful solution.

aworker · January 20, 2022, 5:05pm

Hi @nxfxcom

Here neither, they are strings and a " > " comparison works fine with strings here.

Best

Ael

system · January 27, 2022, 5:06pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.