I am trying to use the string similarity node to compare multiple columns. I’ve created a loop whereby I can compare one column vs all other columns in the table and this runs fine. However, I want to be able to setup an outer loop whereby i can perform this same comparison for more than one column (ie, first (column A vs A, B, C, D etc), then second (column B vs A, B, C, D etc), then third (column B, vs A, B, C, D etc) and so on.
Below is what I have so far but I can’t figure out how I can introduce a loop with a second variable which I can feed into the String Similarity node to do as I have explained above - but maybe I am approaching it wrong. Any help would be appreciated.
I think something like this should do what you want:
You need to make sure that the flow variable output of the outer loop is attached to the input of the inner loop in some way otherwise you will get errors about loops not being fully nested, or very strange behaviour if it does run ok
Thanks for the reply.
I did try that but I’m finding that even though I seem to have 2 variables coming out of the output port of the second Variable Loop Start, when I try to setup the String Similarity it only has 1 Column Header available.
Since I want the column variable from the outer loop to be column 1 and the column variable from the inner loop to be column 2, I am still not sure how to make them both available - or maybe I just need to approach it a different way.
That’s a bit strange - I would have expected to see a duplicate variable from the outer loop with something like
(#1) as a suffix.
I was expecting the same. In the end, I managed to make it work by combining the Variable Row to Table Loop with the Column List Loop Start. This resulted in two different variables which I could use as my inputs into the String Similarity node.
Maybe there was a better way but this worked.
to avoid mess in your workflow and a double loop you can try following:
- after transpose use Cross Joiner where both input ports come from transpose node. This will create all possible combinations of columns to compare. To avoid comparing column to itself you can use Rule-based Row Filter to filter such rows
- then you use Table Row To Variable Loop Start node which will give you two flow variables holding column names to compare
(I used String Manipulation node inside loop just as example.) To have flow variable from outer loop inside inner loop when loops are same you can rename flow variable from outer loop so they are not identical. For example you can use String Manipulation (Variable) node to do so.
Welcome to KNIME Community and hope this helps!
Thank you Ivan - even though I was able to make it work using the double loop, this is much cleaner and is exactly what I was trying to accomplish in the first place.
Really appreciate the guidance!
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.