Issues parsing pandas dataframe datetime columns (with NaT values) to knime table

Hello everyone,

I believe there is an unresolved issue with parsing pandas dataframe objects into knime tables. I encountered this issue when using a library my company uses to fetch data from a REST API in my python script node. I received this error message:

Original error message:

ERROR Python Script        0:58:2     Execute failed: NaTType does not support tzname

A first, I tested the same exact input in all other python instalations on my system and on my linux server via ssh, since all of these attempts worked fine, I supposed that knime must be at fault. I have also executed the entire python script node line by line to see what triggers the error, but found nothing that way. The desired output is generated fine within the node, but when actually executing the node, it fails.

This had me baffled for a while before I discovered the KNIME Error log. It turns out that this error was raised by “\py\org\knime\python\typeextension\builtin\datetime2\DateTimeSerializer.py”.

Error log:

2021-07-24 19:01:03,564 : ERROR : KNIME-Worker-24-Python Script 0:58:2 :  : Node : Python Script : 0:58:2 : Execute failed: NaTType does not support tzname
org.knime.python2.kernel.PythonIOException: NaTType does not support tzname
	at org.knime.python2.util.PythonUtils$Misc.executeCancelableUnwrapExecutionException(PythonUtils.java:310)
	at org.knime.python2.util.PythonUtils$Misc.executeCancelable(PythonUtils.java:283)
	at org.knime.python2.kernel.PythonKernel.waitForFutureCancelable(PythonKernel.java:1719)
	at org.knime.python2.kernel.PythonKernel.getDataTable(PythonKernel.java:1030)
	at org.knime.python2.ports.DataTableOutputPort.execute(DataTableOutputPort.java:78)
	at org.knime.python2.nodes.script2.PythonScriptNodeModel2.execute(PythonScriptNodeModel2.java:159)
	at org.knime.core.node.NodeModel.executeModel(NodeModel.java:556)
	at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1259)
	at org.knime.core.node.Node.execute(Node.java:1039)
	at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:559)
	at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)
	at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:201)
	at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:117)
	at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:365)
	at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:219)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
	at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
Caused by: org.knime.python2.kernel.PythonIOException: NaTType does not support tzname
	at org.knime.python2.kernel.messaging.AbstractTaskHandler.handleFailureMessage(AbstractTaskHandler.java:146)
	at org.knime.python2.kernel.messaging.AbstractTaskHandler.handle(AbstractTaskHandler.java:92)
	at org.knime.python2.kernel.messaging.DefaultTaskFactory$DefaultTask.runInternal(DefaultTaskFactory.java:256)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: Traceback (most recent call last):
  File "C:\Program Files\KNIME\plugins\org.knime.python2_4.4.0.v202106201233\py\messaging\RequestHandlers.py", line 96, in _handle_custom_message
    response = self._respond(message, response_message_id, workspace)
  File "C:\Program Files\KNIME\plugins\org.knime.python2_4.4.0.v202106201233\py\messaging\RequestHandlers.py", line 218, in _respond
    data_bytes = workspace.serializer.data_frame_to_bytes(data_frame_chunk, start)
  File "C:\Program Files\KNIME\plugins\org.knime.python2_4.4.0.v202106201233\py\Serializer.py", line 198, in data_frame_to_bytes
    table = FromPandasTable(data_frame, self, start_row_number)
  File "C:\Program Files\KNIME\plugins\org.knime.python2_4.4.0.v202106201233\py\DataTables.py", line 83, in __init__
    serializer.serialize_objects_to_bytes(self._data_frame, self._column_serializers)
  File "C:\Program Files\KNIME\plugins\org.knime.python2_4.4.0.v202106201233\py\Serializer.py", line 124, in serialize_objects_to_bytes
    data_frame.iat[i, col_idx] = serializer.serialize(value)
  File "C:\Program Files\KNIME\plugins\org.knime.python.typeextensions_4.4.0.v202104131404\py\org\knime\python\typeextension\builtin\datetime2\DateTimeSerializer.py", line 53, in serialize
    if object_value.tzname():
  File "pandas\_libs\tslibs\nattype.pyx", line 72, in pandas._libs.tslibs.nattype._make_error_func.f
ValueError: NaTType does not support tzname

In context, the error makes sense, objects of type NaTType don’t have a method called .tzname().
I looked at what all datetime related columns in my dataframe have as values and found the following:

Example dataframe column:

Name: date_update, dtype: datetime64[ns]
0    2021-07-24 16:03:48
1    2021-07-23 17:04:41
2    2021-07-23 17:04:34
3                    NaT
4    2021-07-23 15:45:20
5    2021-07-23 14:48:12
6    2021-07-23 14:42:09
7    2021-07-23 14:35:39
8    2021-07-23 13:03:25
9    2021-07-23 12:34:38
10   2021-07-23 11:05:12
11                   NaT
12   2021-07-23 07:19:40
13   2021-07-23 07:12:26
14   2021-07-22 07:44:25
15   2021-07-21 20:16:08
16   2021-07-21 18:17:46
17                   NaT
18   2021-07-21 14:51:42
Name: date_start, dtype: datetime64[ns]

Below is a minimum example code that replicates the issue. Again, nothing is wrong with the code/python version.

A python script node:

# Example code to replicate issue
import pandas as pd
from datetime import datetime

example_data = [
	{'name':'Alice',
	 'dob':datetime(year=1995,month=6,day=20)},
	{'name':'Bob',
	 'dob':datetime(year=1997,month=3,day=7)},
	{'name':'Charlie',
	 'dob':None},
	 ]
	 
df = pd.DataFrame.from_records(example_data)
# produces
# df:
#       name        dob
# 0    Alice 1995-06-20
# 1      Bob 1997-03-07
# 2  Charlie        NaT

# dob, dtype: datetime64[ns]

output_table_1 = df

As a workaround, I tried to use the fillna() method, but I struggled to find a value to use. You can not mix types, so I settled for using a fallback datetime, very different from all other datetimes present in affected columns.

My question then is, will KNIME dev team implement the ability to parse NaT and other ‘nan’ values from pandas dataframes to knime tables?

And if not, could you recommend a better way to deal with this other than setting all NaT values to an arbitrary datetime?

Regards,
J S

1 Like

Hi @strny , welcome to the forum and thanks for the very detailed investigation, this is much appreciated!

You are right, a NaT (not a time) value is currently not implemented in KNIME, as opposed to NaN's for floats. I will issue a feature request for that.

Meanwhile, I would opt for missing values (red questionmarks) instead of an arbitrary date as the ‘flag’ for NaTs and update your example to the following:

# Example code to replicate issue
import pandas as pd
from datetime import datetime
import numpy as np

example_data = [
	{'name':'Alice',
	 'dob':datetime(year=1995,month=6,day=20),
	 'num':np.nan},
	{'name':'Bob',
	 'dob':datetime(year=1997,month=3,day=7),
	 'num':float('nan')},
	{'name':'Charlie',
	 'dob':None,
	 'num':1.23},
	{'name':'Daisy',
	 'dob':float('nan'),
	 'num':None},
	 ]
	 
df = pd.DataFrame.from_records(example_data)

# NaT -> None -> ?
df.dob = df.dob.astype(object).where(df.dob.notnull(), None)

output_table = df

Hope that helps, and thanks again, best
Lukas

4 Likes

Hello @LukasS ,

Thank you! Worked like a charm.

I am including code for anyone who faces the same issue and happens to find this, that applies what you described to all datetime columns in a given dataframe.

Dear reader,
You can just put this after your code block, replacing ‘Data’ with the name of your dataframe.

# FIX ERROR IN CASE NaT
for dtype, column in zip(Data.dtypes, Data.columns):
    if str(dtype) == 'datetime64[ns]':
        Data[column] = Data[column].astype(object).where(Data[column].notnull(), None)

This is what the data shows up as in the knime table:
Screenshot 2021-07-26 145354

Best,
Jakub

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.