How do I convert String to datetime or how do I replace their values with one correct value from same ID


Even though they have the same InvoiceNo, they have different values for InvoiceDate. So how do resolve this through data cleaning? Like I want to convert “40885.7458333333” to an actual date but it seems impossible so the other option I have is to take the actual/correct datetime “2011-12-08T17:54” and replace it into all the InvoiceDate with the same InvoiceNo. How may I do that?

Hi @SoC, and welcome to the KNIME community!

The decimal numbers in your InvoiceDate represent the elapsed time since 01 Jan 1900. There are 86400 seconds in a day and so if you multiply the number by 86400 you will get the number of seconds since 01 Jan 1900 So far so good. So we know we can find a way to convert the numbers.

However, taking a step back, why do you have a mix of readable dates, and excel internal date values in the source data? I would suggest that the better option (if available) would be to format these correctly as dates in Excel prior to loading into KNIME as then this problem shouldn’t exist.

But, always up for a quick challenge… and assuming that for some reason you cannot change the source data, a method of resolving this would be to convert the excel time serial numbers into Unix timestamps (which are also based on the time that has elapsed since a given date). The “origin” date for Unix is 01 Jan 1970. It’s a shame it’s not the same as the Excel date, but a quick search on google tells me that there are 25569 days difference between the Excel “epoch” date and the unix “epoch” date. Unix timestamps are held as seconds (or milliseconds) since the epoch date, whereas Excel are in 86400ths of a day.

The formula therefore to convert from an Excel to unix timestamp (in seconds) is:

($Invoice Date$ - 25569) * 86400

Since some of your dates are already in a readable timestamp format, we need to split out the different rows for calculation. The Row Splitter can be used to split out those rows containing a “T” in the Invoice Date, and those that don’t.The different rows can then be handled separately, so that those containing a “T” are treated to a simple String to Date&Time conversion, whereas those that don’t are given the full treatment of converting to a Unix Timestamp and then have that converted back into a Date&Time.

Finally the two sets of data can be brought back together and assembled in their original order.

Before


After
image

I hope the attached workflow is of help:
Convert Excel Dates.knwf (28.9 KB)

12 Likes

Hi @takbb , thanks for explaining to me in detail. I appreciate it as this is my first time doing something like this. :grinning_face_with_smiling_eyes:

2 Likes

Hi,
as SoC mentioned, there’re two ways to solve that problem: Repair the timestamps, or write the correct value to the entire group. takbb did the former, which is for that particular problem the better choice. The latter, however, will also work when we can’t fix the timestamps. I want to share my approach, it might be helpful in the future for similar problems.

Note: I assume that there’s exactly one correct timestamp value per invoice number. More than one value could be handled by the duplicate row filter, no problem. If there’s not at least one correct value, this workflow won’t solve the problem properly (this is why the accepted solution is better).

How it works:
The correct values are filtered out by a Row Filter Node. Use Duplicate Row Filter to reduce table to one row per group (optional, see assumptions).
Now we Inner-Join that table with the original table, using the column invoiceNo as condition for both tables. We keep the invoiceDate from the filtered table and all other columns from the original table. This writes the correct value to the entire group. Setting the sorting option in the performance tab prevents the Joiner Node to mess up the row order.
The String to Date&Time Node could be placed after the Joiner as well. It has a smaller table to work on in this place though.

If performance is of importance, it might be worth it to test both solutions. I can see mine being faster, because there’s not as much table manipulation going on. Just an untested gut feeling though.

Here’s the WF and a screenshot. Input and Output is exactly the same as takbb’s solution.
overwrite values in groups.knwf (39.4 KB)

5 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.