Hello,
I am new to KNIME - hope my question is not trivial (/had not been answered 100 times before).
CASE: I’ve got (ie. 20) sets of digits [0-9] (and digits are unique in each set) - example Test_input.xlsx (NOTE: it does not matter if set is collumn-oriented or row-oriented… can be done both ways) Test_input.xlsx (9.2 KB)
I would love to understand how can I count/analyze which:
single value (in all 20 sets) occurs most often?.. or a full % split/count would be even better
pairs (that exists within a particular set, however count should be done in the whole table of all 20 sets) occurs most often?
triplets (meaning full set) occurs most often ?
NOTE: this is simplified example, because my final data-table is way bigger obviously
Thank you for any help or comment!
Borys
hi Sanket,
Thank you very much for your time and the answer!
It compiles :)… and works to some extent, however probably I’ve described it poorly, and thus not necassairly as expected - I am most sorry here… and let me jump to more details (backed up with examples).
Lowest “GroupBy” branch should return occurances - giving results as follows - excel formula “CountIf” (equal to a particular digit)
“Middle” : Pairs are aggregated very well with 3 Column Aggs - that’s a great idea!
However pairs should be also counted within all the sets, so
{0,1} => 3 occurances (set.2 & set.7 & set.16 - that is just so that its clear, does not have to be included in the KNIME solution)
{0,2} => 1 occurance (set.7)
{0,3} => 1 occurance (set.8)
{0,4} => 0 occurances
and analogically with whole triple-sets
{0,1,2} => 1 occurance (set.7)
{0,1,3} => 0 occurances
Hope that is more understandable now… maybe that is what is done already and I just cannot see it (LOL)?
thanks again!
Borys
Hello @BorBla ,
Thank you for explaining the problem in a better way.
To solve this, we used a very different approach, as @armingrudd told me that this falls under the category of Item set mining. So, you first need to install this extension to be able to use the node Association Rule Learner (Borgelt) as shown in the below screenshot.
Sanket,
thank you very much & credits to @armingrudd as well - this works perfectly well now I need to transpose that logic into my use case… but you’ve done ~99% of it with this workflow, BIG FAT THANKS!
There is so much still to learn for myself (as I have not heart of Itemset mining at all… and it’s so usefull!)