time subtraction in python node not working

Is the python plugin very rudimentary in Knime?
I am trying a simple timestamp subtraction but I am getting the following error;

ERROR Python Script 0:15 Execute failed: No serializer extension having the id or processing python type “pandas._libs.tslibs.timedeltas.Timedelta” could be found.
Unsupported column type in column: “stoptime_till_now”, column type: “<class ‘pandas._libs.tslibs.timedeltas.Timedelta’>”.

I have two columns: A timestamp column from a table, and a current time column that I am initializing as shown in the below code:

df2['FG_local_stop_time']=df2['FG_stop_time'] -timedelta(hours=4)
df3['now']=pd.Timestamp.now()
df3['now']=df3['now'].astype('datetime64[s]')
df3['stoptime_till_now']=df3['now'] - df3['FG_local_stop_time']

The FG_local_stop_time and now columns are matching in format:
image

When i execute inside the node it shows up perfectly fine:

I don’t understand what’s going on.

Hi @thentangler ,
the type of df3['stoptime_till_now'] is class 'pandas._libs.tslibs.timedeltas.Timedelta'. This type, I guess, is not supported by Knime, but the whole script (or just the last line) can be replaced by regular Knime nodes

1 Like

The issue here is that KNIME can only serialize certain types back to java types. if a type is not supported, you get this error. But when running the script in the config dialog the serialization back to java doesn’t happen hence no error.

One one hand timedelta is kind of “niche” pandas specific type on the other hand the python script nodes use pandas by default and mapping that back to an appropriate java type (Duration comes to mind) could indeed be handy.

I agree with duristef however that what seems to happen in this script should be better done in KNIME itself with the different date&time related nodes.

1 Like

Hi @duristef and @kienerj,
I wanted to use python since that I am a new user to KNIME but have more experience in python. It is easy for me to slice and dice the data in python and then use KNIME to serve up the visualizations.
I need to do the following to my input tables:

  1. Query DB to get 2 tables -Done via KNIME nodes
  2. Filter the 2 tables
  3. Rename columns
  4. Convert timestamps from UTC to local time in the two tables
  5. Join the 2 tables
  6. Compare 2 columns in the joined table and create a column with TRUE/FALSE based on if the 2 columns are matching or not
  7. Find the time difference of 2 timestamp columns as another column
  8. Out put the resulting dataframe as a table - Done via KNIME nodes
  9. Visualize - Done via KNIME nodes

Steps 2 through 7 are done in the python node. I thought it would be easier than creating and configuring a ton of nodes for those steps.
Any advice how i can go about this?

@thentangler could you please provide us with 2 test tables, so it’s easier for us to think of a solution?

1 Like

Ultimately it’s up to you or to your companies guidelines how you use KNIME. if everyone in the team is proficient in python and it’s a requirement for your team, then it might be ok to do stuff in python that could be done in KNIME but it raises the question why to even use KNIME? Also I would “fear” having a weird mix of Python script an native knime stuff and it’s never clear what happens in the python script. Something complex? or something trivial the creator simply didn’t know how to do in KNIME? I would reserve python usage for complex stuff that can’t be easily done in KNIME itself.

Python Scripts will also make your workflow much slower than it needs to be due to the serialization penalty. Yes, you could use the new Python labs extension that solves this problem mostly but it’s experimental.

Filter, renaming and joining is trivial in knime. Row Filter, Column rename and Joiner nodes. For date&time see the Time Series category. Date&Time Difference and Modify Time Zone nodes.
For point 6 use Rule Engine.

2 Likes

Im using Knime mostly for the visualization parts such as interactive charts, correlation plots etc. It is cumbersome to code plots in python whereas its user friendly to setup plot configurations using KNIME (I hope)

On the other hand its far more easier to write a line of code to filter, join tables etc, vs creating many nodes and configuring each one just to perform a filter, and then join, and then operations etc.
I was hoping to combine the best of 2 worlds.

For example, you mentioned to use Rule Engine to compare 2 columns and create a 3rd column based on the result. I was expecting to pass the 2 tables to the rule engine, however it only accepts variables and outputs variables. So while I agree it can be done in KNIME with a lot of learning on my part, i was hoping to be able to leverage python scripts to get some results at the moment.

Would the R node have similar serialization issues? If not, i might perhaps be able to get what I want done at the moment using R.

I think you can. KNIME is a platform that happily integrates all kinds of tools and code. You could also look into Java itself to do stuff.

The thing with the nodes for every step is deliberate. Our finance guys love it. For their reports and projections they can safely document and comment every step they make and it can be reviewed by other people without having to be a full blown programmer - although professionals are useful to plan and structure a program which most of the time is still necessary.

About your python question. Could you convert the time or duration variables into strings or doubles (days since …) and use that?

About graphics. KNIME has some great nodes there but sometimes I wish there was even more and I then revert to some python or R code that I copied and adapted from the web.

So you will have to plan ahead and as @kienerj said also think about the skill set in your team. A lot of companies build a broad know how cluster around KNIME with beginners, experts and a whole support community.

1 Like

Hi @mlauber71
I was hoping to use the python function that would convert the UTC time stamp to my local one because it has to account for daylight savings. Since this is dynamic, I was hoping I could use the function to automatically do that.
Currently I am manually doing a mathematical subtraction of 4 hours. But I will have to update my code to 5hours when daylight savings occurs in November or so.

I am the only one in my organization using knime as I’m the only data analyst and in my group. I’m also the only guy who knows decent programming in my group.
I’m know python, R and pyspark and I thought I could use Knime to do data analytics to create dashboards and such that will be very useful to my group.

I liked knime because it resembled labview, which is an engineering programming tool that I use a lot. While labview is used to design engineering control systems Knime seemed to be perfect to serve up the analytics derived from our tools.
I will continue learning Knime and I really appreciate the help you and the members of this forum provide!

@thentangler KNIME would support a lot of variable types moving between the KNIME nodes and Python code.

I have set up an example where I try a few things. Some types are not natively supported in the new framework; although you could send them ‘thru’.

What does work is to do some date and time manipulation and then send the result to a parquet file while allowing for some truncation of Datetime to milliseconds (which should be OK in most cases)

df_time.to_parquet(v_parquet_file, compression='gzip'
  , allow_truncated_timestamps=True
  , coerce_timestamps='ms')

What seems to get lost is some information about an Asia time zone. So you absolutely can mix KNIME and Python code and also date and time variables. Sometimes my impression is when dealing with different systems and data and time variables is to transfer them as a string or (long) integer with explicit values (not UNIX-timestamps or some other system).

It’s interesting because I see it exactly the opposite. The data prep part I feel is much easier in KNIME because of the quick visual feedback of the output. Doing that in notebooks and pandas is much more cumbersome at least for me.

For Visualization I prefer Python due to much more control or then external tools like tableau or similar.

You pass 1 table with the 2 columns and then you compare the 2 columns. Not sure why you say it only accepts variables? You can compare columns to each other to a column to a fixed value or to a variable and based on that create a new column or replace an existing one.

If you are the only coder then I would really minimize python usage to only where it is really needed.

2 Likes

@mlauber71 Interesting. I havent parsed timestamps it this way before.
Thank you this solution!

Me neither :slight_smile: but glad it does work now …

@kienerj
This is indeed a different way to look at it. I havent really spent too much time utilizing the plotting libraries in Python. I will try doing so more.

Apologies for the Rule Engine misunderstanding. I was trying with rule Engine variable under workflow control. Im going to try using the Knime nodes more, and rely on python only for the heaviest of lifting :blush:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.