Music data analysis from .csv file

Hi everyone, 

I am going to analyze a huge dataset of 2 millions rows and 7 columns (variables) - this is the dataset -. Since this dataset is too huge to open with Ms Excel, I opened it with Knime.

One of my firsts questions aims to sum the listenings on each region (Italy, France, Uk, Etc...). I thought two ways to proceed:

1) Divide the original .csv file like the number of regions -> Obtaining 53 new .csv files (number of regions in the original dataset). Then, proceed to the analysis on Excel.

or, if I want to work on the original spreadsheet with Knime:

2) Sum streams must be associated with regions (country) and the stream count must reset everytime the region change.

So, I don't know how to proceed and which operators in Knime are able to do this. (and I don't want to divide manually the dataset!). I would be greatful if someone could help me in this work (which represents a part of my final dissertation). 

Thank you very much

Kind regards


Hi Alberto! 

Thanks for using KNIME! As you already said, you can use KNIME to do the analysis of the whole dataset (without bothering with splitting the data). See my solution for your question (sum of streams per region) attached. Just import the Workflow into your KNIME Workspace. Before running the Workflow, please open the configuration dialog of the CSV Reader Node and enter the path to the data file data.csv that you downloaded from Kaggle (we are not allowed to embed Kaggle datasets in public Workflows, sorry for the inconvenience). The output table of the GroupBy Node is the sum of streams aggregated by region. 

I hope this helps you, if you have further questions do not hesitate to ask. 
Good luck with your dissertation!

Kind regards