Compare List of Ingredients

Hi,
I’m struggeling a bit with getting a list comparison done.
I found following post pretty helpful, but finally i can’t apply it on my problem to get it running. Comparing Lists
I need to compare n:n numbers of list, and not just against one “master”.

As an example, i just typed in excel to make clear what i try to achieve for an large data set.

Any recommendation, how to start?

Thanks a lot
Chris

@Chris12345 could you give us more details and maybe a complete dataset that would represent your challenge?

How is “subset” defined? Would the order of the columns be of significance or just the values?

2 Likes

@mlauber71 thanks a lot for asking for more detail:

Attached the workflow until i got stocked, as the join is not working (guess because of two open loops before).

KNIME_project3.knwf (20.3 KB)

1.) Each Row represents the recipe for an apple cake:
Each column contains a necessary ingredient related to each apple cake recipe.

  1. I would like to answer the questions:
    A) Which recipe compositions are the same. (Order doesnt matter)
    B) Which recipe is a subset of another recipe. = Has the same ingredients, but not all of it compared to another recipe. (Order doesnt matter)

For both questions i need to find out which apple cake recipes are related to each other.
E.g.
Recipe 20000000000: is equal to 20000000003
and is a subset of Recipe 20000000001 (As 20000000001 contains one more Ingredient 1000011205)

2 Likes

Hi @Chris12345

The following workflow is a possible solution among others:

In this solution, in order to compare recipes made of materials, one needs first to create a set of materials per recipe (-Column Aggregator- node) and then compare the sets of materials for all the pairs of recipes (-Cross Joiner- node).

A way of calculating the inclusion of a set of materials against another set of materials is by checking whether the string of one is LIKE the string of the other one, where the second one is the one that “is included”. For instance:

"A B C" is LIKE "*A*C*". The following workflow is using this trick to check for inclusion of recipes :wink:

The rules included in the -Rule Engine- node:

$ID$ = $ID (#1)$ => "Set and Set (#1) are the same"
$Set$ LIKE $Set (#1)$ AND $Set Size$ = $Set Size (#1)$ => "Set (#1) Subset are equivalent"
$Set$ LIKE $Set (#1)$ => "Set (#1) is a Subset of Set"
TRUE => "There is Not subset"

An example of the result:

Hope it is clear enough. Otherwise, do not hesitate to reach out for further explanations :slight_smile:

Best
Ael

3 Likes

@aworker Thanks a lot!
Solved my issue right away, nice solution!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.