Music Streaming recomendation System (LastFM)

jacobo123 · June 3, 2020, 11:53am

Hi!
I am doing a project for university using the Recomendation system of KNIME and now I have an inquery regarding the Calculate Association Rules Node.

Within the Node there are diferent operation nodes:

First for the upper calculation:
I assume:
The Learner Rule NODE is incharge of mathcing at its highest Probability the Songs that a User listens.
Question:
What is the principle of funtionality of the Split Collection Column in order to Split the artist in different rows.
How are they presented together in the table regarding the Consequent and Split Value?

For the Lower Calculation:
On the GroupBY Node
I assume
The (Count artist ID ) is used to “actually know how many time have an artist been played”

Question
Why is neede to do (Fisrt Artist ID)?
Does it take the First sample of the artist?

In the Complete Structure of the System
In the Math Formula Node
What does it represent the #Itemsetsupport?
And why does the Rule Quality depend on it ?

I hope to find Help with this concrete issue.
Any Information is Welcome.
Best regards

ScottF · June 9, 2020, 7:21pm

Hi @jacobo123

Welcome to the forum and sorry for the delayed reply. Let me see if I can answer your questions.

The Split Collection Column node isn’t doing anything fancy in this case - it’s basically just converting individual list items to strings for later reporting. I don’t know if you had a chance to open the BIRT report associated with this workflow, but the basic idea is “Users who listen to [Antecedent] also like to listen to [Consequent]”, sorted by the artists with strongest associations.

The First aggregation in the GroupBy node is used to grab pictures and artistIDs - since we know those values are static and we want to use them later in a generated report, the first one is fine to select. We could have just as easily used Last aggregation here.

As presented in the Borgelt documentation (linked in the node description) on the Item Support:

the absolute support (or simply the support ) of the item set S is the number of transactions in T that contain S.

About rule quality you can also read more here: Apriori Documentation

Hope that helps!

jacobo123 · June 10, 2020, 6:25pm

Thanks for the Infromation

It does help!

Best regards

jacobo123 · July 10, 2020, 2:11pm

Greegtings! @ScottF
I would like to ask you two remaining questions for finishing my university report about KNIME recomendation system, it would be great if you could help me out.

First Question:
Once the whole procces is finidhed with the recomendation engine, how does the Music platform uses such information? Does the music company see the results on knime and take desicions? or is there a program code that links their work with KNIME?
I do not understand this next apporach from the companies and It is important to show it on the report, hope you could give me a hint.

2)Second Question:
Since Knime software is an open source program, and the costs only happen when the KNIME server is obtain, how does the costing work for the Music Platform?
what are they buying exactly and also iportant, how does the costing works for this specific case for a music platform?

Thanks in advance for the help.

Best regards.

ScottF · July 10, 2020, 2:26pm

Hi @jacobo123 -

First off, I should say that this workflow is really just an example of how you might implement an association rule algorithm on the publicly available Last.fm dataset - it’s not a workflow that is used in production.

That said, in a case like this it’s likely that in a production context, the REST API available in KNIME Server would be used. That is, the workflow could be called using JSON input of users and artists, the workflow would run, and the resulting associations would be provided as JSON output. This could all be implemented as part of a web page to get recommendations “on the fly”. We have examples of how this type deployment works on the Hub (although not specific to this Last.fm example, they are more general in nature).

As far as costs go in this hypothetical example, the workflow would be developed using KNIME AP (free) and deployed to KNIME Server, which is where the costs would factor in. KNIME Server can be licensed on an annual basis, or using as BYOL pay-as-you-go approach using a cloud server.

Does that help?

system · January 9, 2021, 2:30am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.