Hello everyone,
I believe there is an unresolved issue with parsing pandas dataframe objects into knime tables. I encountered this issue when using a library my company uses to fetch data from a REST API in my python script node. I received this error message:
Original error message:
ERROR Python Script 0:58:2 Execute failed: NaTType does not support tzname
A first, I tested the same exact input in all other python instalations on my system and on my linux server via ssh, since all of these attempts worked fine, I supposed that knime must be at fault. I have also executed the entire python script node line by line to see what triggers the error, but found nothing that way. The desired output is generated fine within the node, but when actually executing the node, it fails.
This had me baffled for a while before I discovered the KNIME Error log. It turns out that this error was raised by “\py\org\knime\python\typeextension\builtin\datetime2\DateTimeSerializer.py”.
Error log:
2021-07-24 19:01:03,564 : ERROR : KNIME-Worker-24-Python Script 0:58:2 : : Node : Python Script : 0:58:2 : Execute failed: NaTType does not support tzname
org.knime.python2.kernel.PythonIOException: NaTType does not support tzname
at org.knime.python2.util.PythonUtils$Misc.executeCancelableUnwrapExecutionException(PythonUtils.java:310)
at org.knime.python2.util.PythonUtils$Misc.executeCancelable(PythonUtils.java:283)
at org.knime.python2.kernel.PythonKernel.waitForFutureCancelable(PythonKernel.java:1719)
at org.knime.python2.kernel.PythonKernel.getDataTable(PythonKernel.java:1030)
at org.knime.python2.ports.DataTableOutputPort.execute(DataTableOutputPort.java:78)
at org.knime.python2.nodes.script2.PythonScriptNodeModel2.execute(PythonScriptNodeModel2.java:159)
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:556)
at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1259)
at org.knime.core.node.Node.execute(Node.java:1039)
at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:559)
at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)
at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:201)
at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:117)
at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:365)
at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:219)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
Caused by: org.knime.python2.kernel.PythonIOException: NaTType does not support tzname
at org.knime.python2.kernel.messaging.AbstractTaskHandler.handleFailureMessage(AbstractTaskHandler.java:146)
at org.knime.python2.kernel.messaging.AbstractTaskHandler.handle(AbstractTaskHandler.java:92)
at org.knime.python2.kernel.messaging.DefaultTaskFactory$DefaultTask.runInternal(DefaultTaskFactory.java:256)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: Traceback (most recent call last):
File "C:\Program Files\KNIME\plugins\org.knime.python2_4.4.0.v202106201233\py\messaging\RequestHandlers.py", line 96, in _handle_custom_message
response = self._respond(message, response_message_id, workspace)
File "C:\Program Files\KNIME\plugins\org.knime.python2_4.4.0.v202106201233\py\messaging\RequestHandlers.py", line 218, in _respond
data_bytes = workspace.serializer.data_frame_to_bytes(data_frame_chunk, start)
File "C:\Program Files\KNIME\plugins\org.knime.python2_4.4.0.v202106201233\py\Serializer.py", line 198, in data_frame_to_bytes
table = FromPandasTable(data_frame, self, start_row_number)
File "C:\Program Files\KNIME\plugins\org.knime.python2_4.4.0.v202106201233\py\DataTables.py", line 83, in __init__
serializer.serialize_objects_to_bytes(self._data_frame, self._column_serializers)
File "C:\Program Files\KNIME\plugins\org.knime.python2_4.4.0.v202106201233\py\Serializer.py", line 124, in serialize_objects_to_bytes
data_frame.iat[i, col_idx] = serializer.serialize(value)
File "C:\Program Files\KNIME\plugins\org.knime.python.typeextensions_4.4.0.v202104131404\py\org\knime\python\typeextension\builtin\datetime2\DateTimeSerializer.py", line 53, in serialize
if object_value.tzname():
File "pandas\_libs\tslibs\nattype.pyx", line 72, in pandas._libs.tslibs.nattype._make_error_func.f
ValueError: NaTType does not support tzname
In context, the error makes sense, objects of type NaTType don’t have a method called .tzname().
I looked at what all datetime related columns in my dataframe have as values and found the following:
Example dataframe column:
Name: date_update, dtype: datetime64[ns]
0 2021-07-24 16:03:48
1 2021-07-23 17:04:41
2 2021-07-23 17:04:34
3 NaT
4 2021-07-23 15:45:20
5 2021-07-23 14:48:12
6 2021-07-23 14:42:09
7 2021-07-23 14:35:39
8 2021-07-23 13:03:25
9 2021-07-23 12:34:38
10 2021-07-23 11:05:12
11 NaT
12 2021-07-23 07:19:40
13 2021-07-23 07:12:26
14 2021-07-22 07:44:25
15 2021-07-21 20:16:08
16 2021-07-21 18:17:46
17 NaT
18 2021-07-21 14:51:42
Name: date_start, dtype: datetime64[ns]
Below is a minimum example code that replicates the issue. Again, nothing is wrong with the code/python version.
A python script node:
# Example code to replicate issue
import pandas as pd
from datetime import datetime
example_data = [
{'name':'Alice',
'dob':datetime(year=1995,month=6,day=20)},
{'name':'Bob',
'dob':datetime(year=1997,month=3,day=7)},
{'name':'Charlie',
'dob':None},
]
df = pd.DataFrame.from_records(example_data)
# produces
# df:
# name dob
# 0 Alice 1995-06-20
# 1 Bob 1997-03-07
# 2 Charlie NaT
# dob, dtype: datetime64[ns]
output_table_1 = df
As a workaround, I tried to use the fillna() method, but I struggled to find a value to use. You can not mix types, so I settled for using a fallback datetime, very different from all other datetimes present in affected columns.
My question then is, will KNIME dev team implement the ability to parse NaT and other ‘nan’ values from pandas dataframes to knime tables?
And if not, could you recommend a better way to deal with this other than setting all NaT values to an arbitrary datetime?
Regards,
J S