I have a dataset wherein one of the date columns has values in three different date formats. But I would like to keep just one format for all the values in the column, say : yyyy-mm-dd
Eg:
Date Column
Tue Dec 13 00:50:04 EST 2022
2022-11-04
11/21/2022 02:30:44 PM
I would appreciate your suggestions to get this work?
Looks like a string column.
I mention @takbb because he has build some incredible “convert various date formats to date type” but i do not have the url now. Sorry @takbb
Yes I have done some work with date conversions, but I don’t yet have a totally magic bullet, but my flexible date format component can potentially take away some of the pain.
Provided you don’t want to retain the time information, this may be of assistance.
I have attached a workflow here. It just uses the three sample dates that you have given. If there are even more varied examples, then these would have to be dealt with separately.
It uses the row splitter to divide the different rows out based on simple regular expression formats. I haven’t written entire expressions to perfectly analyse the formats, but instead just picked up simple features so it says if it starts with 3 letters, then it is one format, if it starts with 2 digits and a slash it is another format, otherwise it is the third format. In this way we then have our data split into something we can process.
Next is to use our knowledge of your data to collect the parts of the date strings containing the day month and year. I do this because my “flexible date format” component interprets based on day month and year, but additional information (such as time, or day names) can confuse it, so I used String Manipulation to simply extract what I needed.
After that, we find that two your date formats have the date order M-D-Y
whilst the other has the date order Y-M-D
This is all the flexible date reader needs to do its thing, so I throw the two different forms into the component, then out the other end we can concatenate the rows back together and tidy up
Now, all of the above could of course be done by splitting the rows in a similar way and then using the standard String to Date&Time node, and supplying the different variations of date mask required for each of your date formats, but basically the component does the “boring” part of working out the date formats, provided it can be passed dates in a known ordering of Day, Month and Year