Datetime index for time series data in new Python scripting nodes

The new python scripting nodes (as of Knime 4.7 with Apache Arrow) and the columnar table back end work like a charm, since there is virtually no processing to and from the pandas scripting nodes. This is important if you have a large number of rows (7 million in my current case). However, as my data is time series data, I still need to parse and set the index to have a proper pandas data frame with a datetime index. This again takes time and seems inefficient - am I missing something?

I set my datetime index with the following, but it takes a long time (longer than the actual operation, e.g. a resampling). My table contains one “Local Date Time” column and columns with doubles.

import knime.scripting.io as knio
import pandas as pd
df =  knio.input_tables[0].to_pandas()
df.set_index(pd.to_datetime(df['Date']), inplace=True)

According to the KNIME Python API, the function from_pandas() has a RowIDs parameter, but I don’t get it to work as expected.

Alternatively, using a RowID node to set the index from “Row0” to contain my datetime column works in a way, but my resulting dataframe index is not of type datetime. Is this a limitation or am I missing something?

I am happy for any recommendations / experiences - thank you!

Hello @mbloechle,

Can you share what goes wrong when using the from_pandas() function?
Also for row ids there is not specific datatype, all row ids are of type “String”.

BR,
Ali

Hi Ali, sorry for the mixup, when reading and creating the pandas dataframe I use to_pandas() of course (and not from_pandas() as I wrote). Still, using to_pandas() I need to create and parse my timeseries index first, there is no quicker way around it, correct?

Also if Knime RowIDs are always string, this means I have to create a datetime index always from string. So using the RowID node does not help me in this regard. Thank you!