H-P Point (repetition of words)

My team and I are trying to measure the significance of repetitions on the virality of a song in the music industry.

We have learned about the H-P point approach from this article.

However, we are having troubles on implementing this approach on Knime.

We have a dataset that contains (artist_name, song_title, song_lyrics)
After performing a Bag of Words, our idea is to split the dataset into different sub-dataset containing the song title, artist, and the TF (abs) related to the word.

At that point, we could use the row ID to check whether the row value of the TF is equal to the row ID.
Yet, we would need some guidance on this as we are not sure this is possible in Knime and we also believe it would result into a significant computational effort.

Also, we noticed that through the GroupBy we can aggregate by TF as a list, however we don’t know how to manipulate it afterwards, could this be another option?

2 Likes

Hi @Rachele_Cecere and welcome to the Knime Community.

This sounds like a very interesting project.

Could you share what you have done so far with some sample data? It might be easier for you also to point out where exactly in the workflow you need help.

3 Likes

Hi @bruno29a !
Thank you for the warm welcome in the Knime Community and for your fast reply!

Unfortunately, I have been quite busy this past week and couldn’t get back to you sooner.

With the help of our professor, we managed to solve the problem with the following workflow:

Also, if you would like to get a closer look to our project here is the link to our public space :smile: cloonic/Group 9 Public Space – KNIME Hub

3 Likes