The new python scripting nodes (as of Knime 4.7 with Apache Arrow) and the columnar table back end work like a charm, since there is virtually no processing to and from the pandas scripting nodes. This is important if you have a large number of rows (7 million in my current case). However, as my data is time series data, I still need to parse and set the index to have a proper pandas data frame with a datetime index. This again takes time and seems inefficient - am I missing something?
I set my datetime index with the following, but it takes a long time (longer than the actual operation, e.g. a resampling). My table contains one “Local Date Time” column and columns with doubles.
import knime.scripting.io as knio
import pandas as pd
df = knio.input_tables[0].to_pandas()
df.set_index(pd.to_datetime(df['Date']), inplace=True)
According to the KNIME Python API, the function from_pandas() has a RowIDs parameter, but I don’t get it to work as expected.
Alternatively, using a RowID node to set the index from “Row0” to contain my datetime column works in a way, but my resulting dataframe index is not of type datetime. Is this a limitation or am I missing something?
I am happy for any recommendations / experiences - thank you!