I am reading my data from a CSV file . Some of my columns contain cells that store data in the form of lists or JSON file formats. I was wondering if there were any options in KNIME that could explode JSON and also list cells into several rows. As my CSV file is very big(35G), I am looking fo an efficient way as the file size would increase after the explosion.
Examples with columns containing cells with List / JSON: (Green colur is what I expected).
My data store weekly sales information for each store and I attached 3 rows of my data. In your first example, where does the Day information come from?
Column G stores the sale amount for the seven days as a list data type. I have two options for day information.
Option 1: I have opening and closing week dates stored in cells D &E.
Option 2: All of my lists start on Sundays, so I can create new rows for all rows beginning with Sunday and ending with Saturday.
In both examples, what happens if there’s more than one row in the initial table?
For the first example, each row should be duplicated by seven times.
Can you upload examples of csv files (a couple dozen rows is fine) that you’re trying to process along with some more details of your desired output?
In the first example, I have only 3 rows, and the desired output would be 21 rows as all of the three rows should be duplicated for 7 days and assign the sale amount to each day.
For the JSON format, the key and values are different for each sample. so I need to repeat the rows based on that.
Let’s first deal with the list columns. In the following screenshots I’ve filtered out a few columns just to make it easier to follow along.
Since you’re loading a text/csv file, KNIME doesn’t know that the list columns (e.g. visits_by_day) are lists. KNIME reads this as a string column, as shown by the S icon in the column header:
After renaming the column so that it has the original name, I use the Cell Splitter node to split the column using the comma as the delimiter, and format the output as a list:
The idea is to go row by row and use the start and end date information to generate a series of dates (as rows), from which a day number and day name can be extracted. At the end, this series of dates gets appended to the main stream. I’m assuming the values in the main stream are in chronological order.
I’ve updated the previous workflow on the KNIME Hub