Combing Large data files and missing value interpolation

Hello! I am new to KNIME and I am having trouble trying to figure out where to even start. My goal is to combine large data sets and then interpolate missing values based off of times and locaiton. My data is current layed out into hourly .csv files and I would like to combine them by day. 


655 11.8 203.76 5/31/2017 0:00:00
680 14.02 198.08 5/31/2017 0:00:00
626 25.49 194.68 5/31/2017 0:00:00
632 27.38 210.92 5/31/2017 0:00:00

Its animal number, X location, Y location, date, time. I do not have columnIDs on the tables. 

I would also like to interpolate missing X and Y location for seconds that are not in the data set for each animal for each day. I have 50 days worth of this data, so there is no way one can do this by hand. I think KNIME is the platform that can do this, but I am just not sure where to start. Any advice would be amazing! Thank you!



Hi and welcome to the KNIME community,


just to check whether I understood your problem correctly. So at the moment you have a lot of cvs files (probably 50*24) each containing the data of one hour for different animals and your goal is to have in the end 50 tables each containing the data of one day for different animals where missing values for the X and Y location are imputed through a interpolation? Is this correct? If yes you could solve it in the following way.


1. Reading the data:

- If all your csv files are in one folder you can use the List File Node to get a table which includes the paths of the csv files.

- You can than use a loop to read and concatenate all the csv files. Therefore you can use the Table Row to Variable Loop Start node in combination with the File Reader node, where the file location is controlled by the flow variable “URL", which is created by the Table Row to Variable Loop Start node. To close the loop you can use a Loop End node. 


You can find a similar example on the example server under 06_Control Structures / 04_Loops/ 02_Example_For_Reading_a_List


2. Sorting the data: 

In a next step you could sort the data using the Sorter node and sort by date, time and animal. 

It might be helpful to transform your date and time into to the internal Knime date&time format and sort afterwards. You therefore need the String Manipulation node to join the columns date and time into one column. You can than use the String to Date&Time node convert the column format.


3. Imputing Missing Values

To impute the missing values in the X and Y location you can use the Missing Value node, choosing the linear interpolation. This only works if the data typ of you column is either double or integer, otherwise you have to use the  String To Number node in advance. 

Note, that in the second tab of the configuration window of the missing value node you can define how the missing values in each column should be handled.


4. Write data into different files

To write the data of each day into a single file you can use again a loop. To start the loop I would use a Group Loop Start node and group by date. Next I would use a Create File Name node, where the Base file name is controlled by the flow variable. To write the table you can use for example a CSV Writer node where the output location is controlled by the flow variable, which was created with the Create File Name node. You can close the Loop with the Variable Loop End node.


For the writing part you can find a similar example on the example server under 06_Control Structures / 04_Loops/ 05_Write_each_row_in_one_fil