Writing a flexible Date Format handling workflow

It’s been over a year and a half since I wrote the above, and toyed with the idea of having something that just converted dates without having to care too much about how they were formatted.

Well… here is the launch of version 1 of my “Flexible Date Reader” component.

Building on the approach outlined above, in summary the idea is that given a date, and the basic knowledge that the date will be in a specified element order…
e.g.
d-m-y
or y-m-d
or m-d-y
… the idea is that it should not matter what the actual format of the date is, provided that it can discern the presence of the above elements:

d-m-y (English (UK))
image

m-d-y (English (US))
image

and it should work across different locales too, without me having to code in any translations (as the underlying java date classes should just handle all that)

d-m-y (French)
image

I have uploaded a demo workflow
here:

The configuration of the reader is relatively straightforward. It needs to know three things:
(1) the column containing the dates!
(2) the date-element order in the dates to be interpreted
(3) the locale (so it can interpret dates in the required language)
Locale-selection is a bit painful as there are so many! I couldn’t find a simple way of listing all the locales used by the String to Date&Time node. I ended up interrogating the java class to give me back a list of them, which seems to be a different (but overlapping) set to the one used by that node, so I guess I may find some don’t quite work as intended.

(4) a set of “standard” words that are common in dates and should be ignored as “noise”.
This 4th item contains a few noise-phrases that I could think of. I note in my default values I have different cases. I don’t think I need the repetition, so maybe in v2 that will be removed :wink: I was making a guess at some noise words in French and German so maybe native speakers of different languages might suggest how many such additional phrases commonly appear in words in written dates in those languages.

I’d greatly appreciate feedback.

One area I know nothing about is how well the java date classes work with languages where there are diacritics (or “accents”) on some letters. In my experimenting it appeared that their inclusion is mandatory, so for example, in French, it seems that February must be written with the accent on the first “e” in février in place, and attempting to use my fevrier fails. Is that something that causes problems? Do French-speakers ever get “lazy” with the use of accents? Maybe this is the sort of thing that @aworker and @bruno29a can advise me on (No, I’m not suggesting you guys are lazy!! :wink: ). And what about other languages, what sort of things present problems with translating dates from “written” form? I’m very aware that my method for converting is based on my experience of how dates are written in English in various forms, and there may be completely different forms of date-writing that I haven’t considered.

Anyway… drum roll… the sample workflow produces these results, based on the input data and “locale”/element ordering configuration:

image

image

image

It’s a start! And to be honest… it’s a bit of fun but may have some serious applications. enjoy!

2 Likes