Date and Time Manipulation

Hello everyone,

I am having a date and time column where there were different date formats and this column contains around 110000 rows. I cannot change everything manually in excel and in knime, I cannot make use of legacy date time with multiple date formats.

Can anyone suggest with the solution!!

Thanks and Regards,
Neelima

You can try


It will try to guess the date from different formats (you need to check if it is correct date).
Or use all possible formats, concatenate the result from number of tables and filter out missing values.

3 Likes

I cannot get this node at all. Rather, I used Date Field Extractor (legacy) and it was throwing an error message

You need to install it.

Hi @ngumpena,

What date formats do you have?

:blush:

@ngumpena please note that the license of those nodes is not 100% free for the palladian nodes. You need a license to e.g. use it on the KNIME Server

You can use the new String to Date&Time node as you can find here on the hub:

If you deactivate on failure, the node will not fail in case some are not convertible

Above mentioned Date Extractor from Palladian has the advantage, that you don’t need to tediously define parse patterns manually, and that it can handle varying formats automatically.

What problem are you facing when installing the Palladian nodes? Let me know and I’ll be glad to help.

… as you do with running a KNIME Server itself, which is also not “free”.

– Philipp

4 Likes

Hi Iris,

I have 3 different formats

  1. 01-02-2019 which is showing in Knime as 2019-02-01 : works perfect
  2. dd-mm-yyyyThh:mm:ss : this format also works perfect
  3. dd/mm/yyyy: Not working

When I used String to date time, all the cells are converted except the ones in 3rd format which is why I am looking for a solution

Also, I have used “Date Extractor” node: This seperates Date field completely from the table and how can I connect again to the relevant row by using this node?

I think you have 2 options:

First:
If the Date Extractor node recognizes all the date formats in your data set, you can join the output with the original table based on row IDs. Notice that the Date Extractor node skips the rows with unrecognized format, so if there are any of them, and as the node refreshes the row IDs, you cannot join the tables.

As an alternative solution, since you have a few number of date formats, I think you could change all the formats to the standard format first and then use the String to Date&Time node. E.g. this expression in the Column Expressions node will convert the mentioned formats to yyyy-MM-dd format:

if(regexMatcher(column("column1"), "[\\d]{2}-[\\d]{2}-[\\d]{4}(T.*)?")) regexReplace(column("column1"), "([\\d]{2})-([\\d]{2})-([\\d]{4})(T.*)?", "$3-$2-$1")
else if(regexMatcher(column("column1"), "[\\d]{2}/[\\d]{2}/[\\d]{4}")) regexReplace(column("column1"), "([\\d]{2})/([\\d]{2})/([\\d]{4})", "$3-$2-$1")
else column("column1")

So the String to Date&Time node can easily convert the column type.

Here is the workflow which contains the application of both options:
date_manipulation.knwf (26.3 KB)

:blush:

P.S. Dear @ngumpena, if you set the output type to “Local Date” in the Column Expressions node, you do not need the String to Date&Time node anymore.

5 Likes

Hi both,

@armingrudd very nice solution!

I allowed to made another approach, where we are concatenating different formats behind each other.

date_manipulation.knwf (21.2 KB)

I did two things, first I used [ to use an optional part of the parsing format.
Second I combined them after each other to have each covered.

Hope this helps!
Cheers, Iris

2 Likes

Hi Armin,

date extractor node is not identifying the correct format. For 02/09/2019, it is showing as 09.feb.2019. Regex expressions works perfect for this case

could you please explain me the regex expression?

1 Like

Of course, my pleasure!
As you can see we have two functions: regexMatcher() and regexReplace().
The first one recognizes the format and the second one changes the format to the standard format.
[\d] stands for any digits (we use double backslashes -which is the escape character- one for the node and one for the regex itself) and {n} specifies the number of digits. So [\d]{2} means two digits. (T.*) specifies the time part of the string and the ? character makes it optional. In the regexReplace we use parentheses around each part (day, month and year) so we can reference them for the output as $n.

You can learn more about using regex in KNIME here:
https://blog.statinfer.com/regex-in-knime-what-a-functionality-booster/

And this website has everything you need to know about regex:
https://www.regular-expressions.info/

:blush:

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.