Split into pairs

Hi,

I would like to split a comma-separated string into string pairs, for eaxmple:

a single cell with "a,b,c,d" should be split into two columns containing the pairs

a b

a c

a d,

b c

b d

c d

ideally without a-a, b-b, and so on. I have have a lengthy solution using cell splitter, ungroup and a java row spliter sorting out the x-x pairs. However, doing this on more than a million rows is time consuming. Can anyone help me with a faster solution (Java snippet?).

Thanks in advance

Hi,

i made you an example workflow. However, I think there are multiple ways to solve this problem.

Best, Iris

Many thanks Iris,

your solution, however, doesn't seem to work for more than one row and looping through 1 mio rows would not be fast enough. I attach my current workflow to illustrate want I want to achieve. It seems complicated to me.

Regards Jerry

Many thanks Iris,

your solution, however, doesn't seem to work for more than one row and looping through 1 mio rows would not be fast enough. I attach my current workflow to illustrate want I want to achieve. It seems complicated to me.

Regards Jerry

I would think a java snippet with something like the following:


// Your custom imports:
import java.util.List;
import java.util.ArrayList;


// Enter your code here:

String[] vals=c_column1.split(",");
List<String> p1 = new ArrayList<>();
List<String> p2 = new ArrayList<>();

for(int i=0; i < vals.length - 1; i++){
    for(int j=i+1; j < vals.length; j++){
        p1.add(vals[i]);
        p2.add(vals[j]);
    }
}

out_p1 = p1.toArray(new String[0]);
out_p2 = p2.toArray(new String[0]);
 


// expression end

where

c_column1 refers to the input column
and p1 and p2 are output columns which are both StringCell arrays should do what you want, if you follow it with an ungroup node with both collection columns selected.

If you want to remove duplicates (so "a,b,c,a,d" only give a-b, a-c, a-d, b-c, b-d, c-d) then you could modify to e.g.

// Your custom imports:
import java.util.List;
import java.util.ArrayList;
import java.util.TreeSet;
import java.util.Arrays;

// Enter your code here:

List<String> vals=new ArrayList<>(new TreeSet<>(Arrays.asList(c_column1.split(","))));
List<String> p1 = new ArrayList<>();
List<String> p2 = new ArrayList<>();

for(int i=0; i < vals.size() - 1; i++){
    for(int j=i+1; j < vals.size(); j++){
        p1.add(vals.get(i));
        p2.add(vals.get(j));
    }
}

out_p1 = p1.toArray(new String[0]);
out_p2 = p2.toArray(new String[0]);

Does that do what you wanted?

Steve

 

Super. works great.

Many, many Thanks

Jerry

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.