I am new to KNIME and try to understand/rebuild several example workflows.
One of my first workflows is the ‘Market Basket Analysis’.
Very intersting, but I’d like to start more simply.
My data contains 2 columns —> Distinct orderIDs | ProductIDs
In a first step I want to display the most common ProductID combinations (ranked by number of orders).
—> just a ‘transaction-table’, no ‘item-table’
If you only need to rank the current table on “Number of orderIDs” column, you can use “Rank” node to do that for you (which then gives you the ranking of “Product combinations” based on the “Number of orderIDs” columns). Or you you can simply use “Sorter” node, to sort data based on “Number of orderIDs” and get the combinations sorted on the quantity.
You can also rank based on each single item. To do so, first you need to convert “Product combinations” column to a collection column. To do this, use “Cell Splitter” node to create the collection column (checking “as set” option). After that, use “Ungroup” node to have each item of the collection in a new row and in a single column. Then you can use “GroupBy” node to sum the quantity based on each item.
thank you for your answer, but that’s not what I want to achieve.
As far as I understand your suggestion helps me find out
- which product combinations are ordered the most based on the whole order (which exactly the same order does occur the most)
- which products are ordered the most
I want to find out which productIDs are bought together the most frequent within all these orderIDs.
So you need to find item set frequency. For that, again, transform your products column to a collection and then use “Item Set Finder” node to find the frequency of sets (I used 2 as the “Minimum set size” in the configuration). Then use a “Rank” node to rank on relative item support.
If you look for just a simple way that might not be as powerful for big baskets, you might also try
just a group by node where you group by order, and then choose list for product in the manipulation.
then you put another group by node where you group by the productlist and put a count in the manipulation