Writing a flexible Date Format handling workflow

Ever had one of those days when you just can’t get the string to date conversion to work for your workflow?

I’ve seen a few questions on the forum about converting dates. Sometimes the format masks aren’t particularly intuitive, especially when coming to this new, which leads to time wasted by the person who just wants to make their workflow run without all the fuss!

The java-based format masks aren’t always obvious in their meaning, and it occurred to me that it would be nice if there were a way of handling a wide variety of date formats with a single mask, or some generic nodes and/or a tiny bit of generic code.

One of the problems is that if the month is a name rather than a number, the mask is unhelpful in being case-sensitive with respect to the data. So “Feb” will match the MMM mask, but “FEB” and “feb” will not. So some string manipulation would be required to accommodate this. Likewise people sometimes trip up by specifying dd when that “forces” a 2 digit day and refuses to accept a single digit day. Two and four digit years also cause a problem since yy matches ONLY 2 digit years while YYYY matches only 4 digit, and it isn’t always clear how to make it match both!

Another problem is that different people have a wide variety of ways to delimit the date components, and other “noise” can be added too.

I was looking at how we could have a format mask that could handle a wide variety of variations of dates which follow a similar basic format. Wouldn’t it be nice if we could have a piece of our workflow that could handle a wide variety of date formats within the same data set, provided that they conformed to a common order of component parts.

So say, for example we had a rule that the date was in D M Y order but beyond that we weren’t too fussed about exactly how it was presented…

… Wouldn’t it be nice if a single workflow with date conversion could handle all of these in a single flow:

I’ve put together a workflow containing a String Manipulation and a flexible date format mask, which makes use of optional features of the mask, and (in conjunction with the string manipulation) can handle all of those in the list above and more besides. The format mask can be manually adapted according to your locale and the workflow contains a second example that accepts dates in Month Day Year order too:

I did this as “a bit of fun” to see what could be done, but you may find it useful, especially if you have one of those of days where you just cannot get it to handle your date properly!

I might refine this a little and see if I can make it into a useful component!

(as it stands I realise this isn’t very “international”, and for non-English-language data you may need to adapt the code or mask slightly)

8 Likes

It’s been over a year and a half since I wrote the above, and toyed with the idea of having something that just converted dates without having to care too much about how they were formatted.

Well… here is the launch of version 1 of my “Flexible Date Reader” component.

Building on the approach outlined above, in summary the idea is that given a date, and the basic knowledge that the date will be in a specified element order…
e.g.
d-m-y
or y-m-d
or m-d-y
… the idea is that it should not matter what the actual format of the date is, provided that it can discern the presence of the above elements:

d-m-y (English (UK))
image

m-d-y (English (US))
image

and it should work across different locales too, without me having to code in any translations (as the underlying java date classes should just handle all that)

d-m-y (French)
image

I have uploaded a demo workflow
here:

The configuration of the reader is relatively straightforward. It needs to know three things:
(1) the column containing the dates!
(2) the date-element order in the dates to be interpreted
(3) the locale (so it can interpret dates in the required language)
Locale-selection is a bit painful as there are so many! I couldn’t find a simple way of listing all the locales used by the String to Date&Time node. I ended up interrogating the java class to give me back a list of them, which seems to be a different (but overlapping) set to the one used by that node, so I guess I may find some don’t quite work as intended.

(4) a set of “standard” words that are common in dates and should be ignored as “noise”.
This 4th item contains a few noise-phrases that I could think of. I note in my default values I have different cases. I don’t think I need the repetition, so maybe in v2 that will be removed :wink: I was making a guess at some noise words in French and German so maybe native speakers of different languages might suggest how many such additional phrases commonly appear in words in written dates in those languages.

I’d greatly appreciate feedback.

One area I know nothing about is how well the java date classes work with languages where there are diacritics (or “accents”) on some letters. In my experimenting it appeared that their inclusion is mandatory, so for example, in French, it seems that February must be written with the accent on the first “e” in février in place, and attempting to use my fevrier fails. Is that something that causes problems? Do French-speakers ever get “lazy” with the use of accents? Maybe this is the sort of thing that @aworker and @bruno29a can advise me on (No, I’m not suggesting you guys are lazy!! :wink: ). And what about other languages, what sort of things present problems with translating dates from “written” form? I’m very aware that my method for converting is based on my experience of how dates are written in English in various forms, and there may be completely different forms of date-writing that I haven’t considered.

Anyway… drum roll… the sample workflow produces these results, based on the input data and “locale”/element ordering configuration:

image

image

image

It’s a start! And to be honest… it’s a bit of fun but may have some serious applications. enjoy!

2 Likes

Hi @takbb

Great contribution once more !

I believe yes we get lazy sometimes, so having a tool that accepts days and month with or without accents can be actually very useful, I believe. Obviously this is a personal opinion :wink:

Thanks again @takbb !

Best
Ael

2 Likes