How to count occurances (single / pairs / tripples / etc)

BorBla · November 22, 2023, 12:15pm

Hello,
I am new to KNIME - hope my question is not trivial (/had not been answered 100 times before).

CASE: I’ve got (ie. 20) sets of digits [0-9] (and digits are unique in each set) - example Test_input.xlsx (NOTE: it does not matter if set is collumn-oriented or row-oriented… can be done both ways)
Test_input.xlsx (9.2 KB)

I would love to understand how can I count/analyze which:

single value (in all 20 sets) occurs most often?.. or a full % split/count would be even better
pairs (that exists within a particular set, however count should be done in the whole table of all 20 sets) occurs most often?
triplets (meaning full set) occurs most often ?

NOTE: this is simplified example, because my final data-table is way bigger obviously
Thank you for any help or comment!
Borys

sanket_2012 · November 29, 2023, 4:10pm

Hello @BorBla ,
Welcome to the KNIME Community!

I have tried to solve this without using any loop nodes, and I am unsure whether it is the most efficient way to solve it, but I hope this helps.

Dummy.knwf (87.8 KB)

Let me know if it works for you.

Thanks,
Sanket

BorBla · November 30, 2023, 1:15pm

hi Sanket,
Thank you very much for your time and the answer!
It compiles :)… and works to some extent, however probably I’ve described it poorly, and thus not necassairly as expected - I am most sorry here… and let me jump to more details (backed up with examples).

Lowest “GroupBy” branch should return occurances - giving results as follows
- excel formula “CountIf” (equal to a particular digit)

“Middle” : Pairs are aggregated very well with 3 Column Aggs - that’s a great idea!
However pairs should be also counted within all the sets, so
{0,1} => 3 occurances (set.2 & set.7 & set.16 - that is just so that its clear, does not have to be included in the KNIME solution)
{0,2} => 1 occurance (set.7)
{0,3} => 1 occurance (set.8)
{0,4} => 0 occurances

and analogically with whole triple-sets
{0,1,2} => 1 occurance (set.7)
{0,1,3} => 0 occurances

Hope that is more understandable now… maybe that is what is done already and I just cannot see it (LOL)?
thanks again!
Borys

sanket_2012 · December 5, 2023, 9:08am

Hello @BorBla ,
Thank you for explaining the problem in a better way.
To solve this, we used a very different approach, as @armingrudd told me that this falls under the category of Item set mining. So, you first need to install this extension to be able to use the node Association Rule Learner (Borgelt) as shown in the below screenshot.

I am also attaching the workflow below. If you have any questions,
Dummy.knwf (102.3 KB)
let us know.

Thanks,
Sanket

BorBla · December 7, 2023, 7:10pm

Sanket,
thank you very much & credits to @armingrudd as well - this works perfectly well now I need to transpose that logic into my use case… but you’ve done ~99% of it with this workflow, BIG FAT THANKS!
There is so much still to learn for myself (as I have not heart of Itemset mining at all… and it’s so usefull!)

system · December 14, 2023, 7:11pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.