Spark to_timestamp() function - wrong output

Hi.

We are using Spark version 2.2 and KNIME 3.6. Something strange is happening when I use the to_timestamp() function. The infp about the funciton can be found in the Spark2.2 Docs.

Here is the SparkSQL code:
issue_spark_to_timestamp2

Look at the output columns col1_ts and col2_ts. They are wrong by one and two hours:

This does not happen when I run the same code in Jupyter Notebooks:

Have you seen this before? Thanks in advance.

Here is the KNIME workflow I am using:
testing_spark_knime_datetime_issue.knwf (6.5 KB)

Emir

Hi Emir,

this is can be explained using timezones.

When converting strings to timestamps you always need to assume or define a timezone in which to interpret the string.

Spark’s to_timestamp function assumes the UTC timezone and hence interprets ‘2018-01-01’ (a string) as 2018-01-01 00:00:00 UTC (a point on the time-line represented using the KNIME Date&Time data type). The KNIME UI displays this point on the time-line, using the timezone configured on your machine, which seems to be CET/CEST.

2015-01-01 00:00:00 UTC is the same point on the time-line as 2014-12-31 23:00:00 CET
2018-08-24 00:00:00 UTC is the same point on the time-line as 2018-08-23 22:00:00 CEST

Best,
Björn

1 Like

Thanks for the reply, Björn.

I understand what you mean, but I’m still a bit confused. CET/CEST is +1 or +2 hours in front of UTC, right? That means that 2015-01-01 00:00:00 UTC should correspond to 2015-01-01 01:00:00 CET?

Regards,
Emir

Hi, is there any update on this issue?

Best regards,
Emir