Often bought with from Sales File

ChrisHill · June 21, 2023, 8:23pm

Hi, thanks everyone for the great suggestions! I eventually decided to use Python instead of KNIME, since this was much easier for me.

The code below gives the ArticleNrs of up to six (could be less) ArticleNrs that have been bought together with the initial article, but at least twice together, to get rid of some outliers or articles that have little sales.
The part until df.dot gives a matrix of all sales, so rows and columns are ArticleNrs and the cells are how often they have been sold together.
The lambda takes each row, sorts them according to the sales and then takes a list of the column names of these (i.e., ArticleNrs). The first one is discarded, since it’s always the initial ArticleNr. Then the series is transformed to a data frame, the new column named and the index of the ArticleNr replaced with an actual index, to have the ArticleNr available again in KNIME for later.

df = knio.input_tables[0].to_pandas()
df = df.groupby(['ReceiptNr', 'ArticleNr']).size().reset_index(name='count')
df = df.pivot(index='ArticleNr', columns='ReceiptNr', values='count').fillna(0)
df = df.dot(df.T)
top_6 = df.apply(lambda x: x[x >= 2].sort_values(ascending=False).head(7).index.tolist()[1:], axis=1)
top_6 = top_6.to_frame()
top_6.columns = ['SoldWith']
top_6.reset_index(inplace=True)
knio.output_tables[0] = knio.Table.from_pandas(top_6)