I saw that it’s possible to send date/time columns to Python which then would end up as ‘object’ columns in the pandas dataframe. These columns then will show up as date/time columns back in KNIME if the output table = input table.
I just try to understand how KNIME can ‘know’ that these columns are not only string columns but date/time columns. Is there any documentation on how to retrieve date/time columns from Python?
If the type of a pandas column cannot be determined or is too generic (“object”), we simply look at the first entry in that column, grab its type, and check if it is a type that we know how to handle. All following entries are assumed to be of the same type, otherwise a runtime error is raised.
In the case of dates and times, we rely on such values being stored in the DataFrame as Python’s built-in
time types, or as pandas’s
Timestamp type. Other types are not being detected. In particular, there is no auto-conversion of strings that are formatted like dates/times or something like that.
There is no specific documentation on that, I could just point you to some relevant locations in the code if you are interested in the technical details. But as outlined above, there is nothing too interesting going on.
thanks a lot for your answer.
Yes, I would be interested in some code locations.
Still, I have a question:
I played around with the some test data of the TestDataGenerator which also contains time/date columns.
First, I transferred the column ‘Local Date’ as Local Date column and it comes back to KNIME as Local Date column. In pandas the column is of type ‘object’. Fine.
Second, I converted the ‘Local Date’ column as String column before. I took the formatting which is given by pandas. Again, in the pandas dataframe, it looks similar to when it was a Local Date column. But now, it comes back as String column to KNIME. That confuses me, because with your answer I would have assumed, that this column is again a Local Date column due to it’s correctly formatted content. Or, is there any difference missing in the pandas output?
Compare: column type of ‘Local Date’, Python output (first entry + pandas column type), Python code is the same for both
I think you misread. I said that there was no auto-conversion of correctly formatted strings . Strings stay strings. Sorry if that was already clear to you and my answer rather confused you than helped to clear things up.
Instead of looking at the column’s dtype, try looking at the type of the first element. That is, compare:
when executed on the original column and the one that was converted into string before. Different types should be shown in each case.
Alright, let me see…
Here is where we try to guess the type of a column if we cannot handle its dtype directly. (The entire enclosing
simpletype_for_columnfunction is there to find out how KNIME should interpret the column’s data.) If we do not detect any “native” type (integer, float, etc.), we check whether the extracted type matches one of the known extension types (such as dates and times) via calls to
These extension types and their (de)serializers are registered on the Java side via an Eclipse extension point. Here, for example, the local date extension type is registered.
For each extension type, a pair of serializer and deserializer has to be provided. This is done per direction (Java-to-Python, Python-to-Java). Here and here are the implementations for the local date type in Python-to-Java direction.
Hope that helps!
Thanks for pointing that out. I’m still not deep into Python and pandas and as strings are stored in dtype ‘object’ columns, I assumed that columns of dtype ‘object’ have string-content. Further, I know that there is the dtype ‘datetime64’ which made me think that KNIME just goes for a string representation in Python as it does not use this dtype…
A misunderstanding from my side. (and the pandas documentation is often not very clear - or hard to find…)
Thanks for the sources. I’ll give them a look.