Help on Item Set Finder

lexitus · March 11, 2015, 11:49am

Hello all,

I recently began to work with knime and so far i am very excited about the possibilities. I got a transaction record set of a customer and i am trying to get the most out of it. So far i did a RFM analysis which was quite an easy task with the functions of the GroupBy node i was even able to calculate all variables in one node.

I also used the Item Set Finder (Borgelt) to analyse the basket. To do so i reduced the data set on two variables, the transaction id and the EAN code. then i filtered out the null values and did a GroupBy on the transaction ids and connected this to the Item Set Finder (Borgelt) and used the Apriori Algorithm. Everything worked very well but i didn´t find enough information so i can be sure i got everything right. So i hope somebody will be so kind to answer my question :)

1. There are many different algorithms and i am not sure which one to choose. Are there big differences in the resupts and if yes which one will be the best choice to use it in discovery mode (eg. if you don´t know very much about the used data yet).

2. The Description for the Minimum Support parameter is "The minimum support" but what is the minimum Support and how do i find out the ideal value?

3. Looking at the result i see two variables of interest: ItemSetSupport and RelativeItemSetSupport. I suppose ItemSetSupport is the number how often the Items in the variable ItemSet have been found together. Is RelativeItemSetSupport% the percentage of how often the two items have been bought together based on the oerall count of both item sold?

Any help would be greatly appreciated.

Best Regards,

Alex

tobias.koetter · March 12, 2015, 10:18am

Hi Alex,

A short summary of the algorithms is available in the node description which you can open via the question mark button in the node dialog. For a more detailed description of the algorithms and finding frequent item sets in general I can recommend you the presentation of Christian Borgelt who is an expert and the author of the algorithms used in the node. Which algorithm to choose depends on the size and the type of your input data. You get some short hints as mouse over on the algorithms in the dialog.
The minimum support defines the number of transaction an item set must appear in to be frequent. You can set the support either as an absolute number or as percentage of the total number of transactions. The support is a very sensitive parameter in that the number of frequent item sets increases dramatically with a lower threshold. Yo you should start with a high threshold and gradualy decrease it.
Yes that is correct. For a detailed description of the item set support have a look at the Aprioir description.

Bye,

Tobias

lexitus · March 12, 2015, 3:00pm

Hi Tobias

This helped a lot!

Thank you very much!

Best, Alex