# Comparing all values of a column to all from another

Hello,

Is there a way to compare two columns from different datasets by comparing each row to all others and not just one row against one?

Regards
Sofia

Hi,

you can use a cross join to create a table with n x m combinations:

â€“ Philipp

4 Likes

Hi there @Shaller,

welcome to KNIME Community!

Sure there are ways but can you tell us a bit more like what are you comparing and what is goal of this comparison?

Br,
Ivan

You are right. More information will make it easier to help solve my problem.

I have a input file that contains the following data structure and examples

Category | specific| category | specific
Fruit | apple | fruit | orange
Vegetable | carrot | fruit | apple
vegetable | carrot | vegetable | peas

The other table contains information such as:

Name | Category | specific | group
Peter | Fruit |apples | group1
Peter | vegetables | peas | group1
Sarah |Fruit | apples | group1
Sarah | vegetables | peas | group1
Tom | fruit | orange | group2
Tom | vegetables | carrots | group3
Peter | vegetables | carrots | group3

The children are grouped according to the food they eat. I want to take my first list and comapre it to the second. e.g check if a group of kids like apples, does it also like oranges?

There is a small hickup in my data. In the category group it says: fruits, apple etc.
However this could always be fixed with a split of some kind.

In my opinion there are two ways to go about it:

First I have to loop over all rows per group -> potentially split groups into separate tables. This process will have to be repeated for many times.

Alternatively I could convert every group into a row, which contains lists of names and lists of fruit and vegetables. However I havenâ€™t figured out how to do this in Knime.

I just struggle to find the correct nodes and more importantly node sequence.

I want non â€śprogrammersâ€ť to be able to used and edit my workflow, therefore I want to try and use as little Code as possible (snippets).

1 Like

This approach works and I used it, however I fear that with a larger data Set it may become very time intensive. Therefore I am still open to alternative ideas.

1 Like

Hi there @Shaller,

cross joining will take time for sure on larger data set. You can try streaming functionality to speed it up:
https://www.knime.com/blog/streaming-data-in-knime

To convert every group into row you can use GroupBy node with appropriate aggregation method.

Mind sharing workflow example with approach that works? Can check it.

Br,
Ivan

1 Like

I am currently using the crossjoin as it was the easiest solution.
However you are right Ivan, that it will become slow with large tables (which I do have).
Therefor I will look into your suggestion.

Thank you for your help everyone.

Regards,
Sofia

1 Like

Unfortunately I can not share my workflow and data, as its confidential.

Hi @Shaller,

to check the concept dummy data in workflow example is good enough

Br,
Ivan

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.