Comparing all values of a column to all from another

Hello,

Is there a way to compare two columns from different datasets by comparing each row to all others and not just one row against one?

Thank you for your help.

Regards
Sofia

Hi,

you can use a cross join to create a table with n x m combinations:

– Philipp

4 Likes

Hi there @Shaller,

welcome to KNIME Community!

Sure there are ways but can you tell us a bit more like what are you comparing and what is goal of this comparison?

Br,
Ivan

You are right. More information will make it easier to help solve my problem.

I have a input file that contains the following data structure and examples

Category | specific| category | specific
Fruit | apple | fruit | orange
Vegetable | carrot | fruit | apple
vegetable | carrot | vegetable | peas

The other table contains information such as:

Name | Category | specific | group
Peter | Fruit |apples | group1
Peter | vegetables | peas | group1
Sarah |Fruit | apples | group1
Sarah | vegetables | peas | group1
Tom | fruit | orange | group2
Tom | vegetables | carrots | group3
Peter | vegetables | carrots | group3

The children are grouped according to the food they eat. I want to take my first list and comapre it to the second. e.g check if a group of kids like apples, does it also like oranges?

There is a small hickup in my data. In the category group it says: fruits, apple etc.
However this could always be fixed with a split of some kind.

In my opinion there are two ways to go about it:

First I have to loop over all rows per group -> potentially split groups into separate tables. This process will have to be repeated for many times.

Alternatively I could convert every group into a row, which contains lists of names and lists of fruit and vegetables. However I haven’t figured out how to do this in Knime.

I just struggle to find the correct nodes and more importantly node sequence.

I want non “programmers” to be able to used and edit my workflow, therefore I want to try and use as little Code as possible (snippets).

Thank you for your help.

1 Like

Thank you for your suggestion.
This approach works and I used it, however I fear that with a larger data Set it may become very time intensive. Therefore I am still open to alternative ideas.

1 Like

Hi there @Shaller,

cross joining will take time for sure on larger data set. You can try streaming functionality to speed it up:
https://www.knime.com/blog/streaming-data-in-knime

To convert every group into row you can use GroupBy node with appropriate aggregation method.

Mind sharing workflow example with approach that works? Can check it.

Br,
Ivan

1 Like

I am currently using the crossjoin as it was the easiest solution.
However you are right Ivan, that it will become slow with large tables (which I do have).
Therefor I will look into your suggestion.

Thank you for your help everyone.

Regards,
Sofia

1 Like

Unfortunately I can not share my workflow and data, as its confidential.

Hi @Shaller,

to check the concept dummy data in workflow example is good enough :wink:

Br,
Ivan

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.