Create pairs from a list or set

I have a Set column that contains a list of IDs:

Bildschirmfoto 2022-01-20 um 13.13.31

My goal is to create a table with all pairs of IDs in each set:

set1_id1   set1_id2
set1_id1   set1_id3
set2_id1   set2_id2
set2_id1   set2_id3
...
setN_idN setN_idN

Operation in pseudo code:

for row in table:
    for element in set:
        return (element, element+1)

Is there a way to achieve this with built in nodes or do I have to write a Java snippet? Would also be great if someone could share a few ideas how to do that in a snippet.

Hi @mpreusse

Could you please post the data shown in your snapshot in text format ? I’ll take it from there and provide a solution.

Best

Ael

Thank you @aworker!

Here is a CSV file that contains a comma seperated list of IDs. The first column contains an identifier for the list of IDs.

ids.txt (84.1 KB)

1 Like

Hi @mpreusse

My pleasure. Would this do the trick ?

For instance, from this list:

to this result:

20220120 Pikairos Create pairs from a list or set.knwf (1004.2 KB)

Hope it helps.

Best wishes

Ael

3 Likes

Absolutely fantastic, thank you @aworker.

Step by step I learn to think in tables and joins :slight_smile:

Is it possible to filter the output table for unique pairs? I.e. delete 5-23 if we have 23-5?

2 Likes

Yes indeed, it is quite easy :wink:

The trick here is to filter out pairs of “doublets” based on Alphabetic order. Since they are all duplicated, one of the two pairs is in alphabetical inverse order and hence, you can filter out the one that is alphabetically “bigger”, for instance:

  11364        34	(is removed)
     34	    11364  (but this one is kept)

Then you need to make sure that only one instance of “self-pairs”, for instance 34 34 is kept too, because normally there should be two. This is done using the -Duplicate Row Filter- node.

20220120 Pikairos Create pairs from a list or set without doublets.knwf (3.0 MB)

Since you are working with gene sequence IDs, I guess your aim is to build a non-directed relational graph and for this you just need one edge between two related genes if the graph is not directed. But I’m just guessing or anticipating what maybe you want to eventually implement :thinking: ?

Thanks for your kind comments and for having validated the answer :wink: !

Best wishes,

Ael

3 Likes

Thank you, the only challenge is my IDs are not numbers so I cant do the Rule based Row Filter on => any ideas?

Thanks @aworker!

That really is a beautiful solution.

1 Like

Hi @nxfxcom

Here neither, they are strings and a " > " comparison works fine with strings here.

Best

Ael

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.