How to manipulate different date time from string format

Hi all,

I have too many date time format in one single column.
Eg.
Feb.25
Apr. 30
yyyy-mm-dd
dd-m-yyyy
d-m-yyyy
m/dd/yyyy

Any idea to quick solve this?
Thank in advance.

Hi @Kenyx

There isn’t really a good solution to this unless you can think of a way that as a person you’d be able to state what each of your dates represented.

For example,

  1. an actual date requires a year so what date does Apr. 30 represent?

  2. If you have dates both in d/m/y and m/d/y format then what date would something like 3/4/2021 represent?

Basically you will need to have some defined rules about how you are going to interpret different dates. Once you have those, we can work out how to help you implement those rules using KNIME.

If you have, for example a default year then perhaps that can be applied in the translation process.

All that said, a while ago I wrote a flexible date handing workload. On the basis that you know the order in which day month and year appear within the date string (and assuming they always all appear), it is possible to interpret a wide variety of dates in a single column. Perhaps this workflow can giving you some ideas.

4 Likes

Thanks for your solution @takbb

  1. That should be year 2019, as there are three people edit the excel file from 2019 to 2020,
    there are different formation due to lack of standard.
  2. There won’t be d/m/y and m/d/y, the reason of d-m is just omit the "0"s at the beginning of days and months

The bottom stream is finished, I am going to deal with yyyy-mm-dd and dd-mm-yyyy at the top part.
Without this dealing, the auto guess is unable to process.

This previous topics may help

You can actually try to check when the file was created or last saved

This might not be the best solution but it could giive you an indication of there is no info in the file itself.

You could explore more (quirky) things about data and time variables an KNIME in this small collection:

2 Likes

But in your list of example, you do have both formats:

2 Likes

Hi @Kenyx, based on your original post, as @bruno29a has mentioned, you showed both d-m-yyyy and m/d/yyyy formats, so I’m going to make the assumption that if the date uses / separators, then it is m d yyyy, but if it uses - separators then the day comes first.

The attached flow will handle all the date formats you have given so far, and can be extended (or changed) to handle others by modifying the Rule Engine.

e.g.
image
where as per your post, 1-8-2021 is 1 August 2021, whereas 1/10/2020 is January 10 2020 (because it uses / symbols)! :wink:

So the dates are translated as follows:
image

It uses a rule engine to define the date format to be used based on the pattern presented

Where there is no 4-digit sequence found in the date, it assumes there is no year and appends 2019 to the end of the datevalue string

The rule engine identifies the date format mask to be applied for conversion. single and double digit days and months are handled automatically by simply specifying d/M or M/d in the format masks as appropriate.

Because the String to Date&Time format mask is either configured, or dynamically supplied by a flow variable, in order to use a different mask for each row, it must be processed inside a loop that creates a different value in the flow variable for each iteration.

An alternative is to use a Java Snippet to perform the same task. Within the Java Snippet, the format mask can be supplied via a column value. This would be noticeably faster on large data sets.

The attached flow provides both the loop and a Java Snippet variant using the same input data.

KNIME_Date_Handler.knwf (39.6 KB)

6 Likes

@takbb
Thank you so much! Sorry for the late response.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.