I would like to split a comma-separated string into string pairs, for eaxmple:
a single cell with "a,b,c,d" should be split into two columns containing the pairs
a b
a c
a d,
b c
b d
c d
ideally without a-a, b-b, and so on. I have have a lengthy solution using cell splitter, ungroup and a java row spliter sorting out the x-x pairs. However, doing this on more than a million rows is time consuming. Can anyone help me with a faster solution (Java snippet?).
your solution, however, doesn't seem to work for more than one row and looping through 1 mio rows would not be fast enough. I attach my current workflow to illustrate want I want to achieve. It seems complicated to me.
your solution, however, doesn't seem to work for more than one row and looping through 1 mio rows would not be fast enough. I attach my current workflow to illustrate want I want to achieve. It seems complicated to me.
I would think a java snippet with something like the following:
// Your custom imports:
import java.util.List;
import java.util.ArrayList;
// Enter your code here:
String[] vals=c_column1.split(",");
List<String> p1 = new ArrayList<>();
List<String> p2 = new ArrayList<>();
for(int i=0; i < vals.length - 1; i++){
for(int j=i+1; j < vals.length; j++){
p1.add(vals[i]);
p2.add(vals[j]);
}
}
out_p1 = p1.toArray(new String[0]);
out_p2 = p2.toArray(new String[0]);
// expression end
where
c_column1 refers to the input column
and p1 and p2 are output columns which are both StringCell arrays should do what you want, if you follow it with an ungroup node with both collection columns selected.
If you want to remove duplicates (so "a,b,c,a,d" only give a-b, a-c, a-d, b-c, b-d, c-d) then you could modify to e.g.