Regex for dates

Hello colleagues,

I have a Date type column:
07.06.2022
05.03.2023
05.07.2025

How can I choose .202* from it with the help of regex?

Thank you in advance!

Best regards,
Ram

Hi @IMR2KA

What do you exactly mean with “choose”? If you want to filter for it, you can use (.*[.][2][0][2].*) in a Row Filter node as Regex pattern.

Test data set:
image

Row filter result:

If you want to extract the year value, you can approach it in several ways like going for a substring based on a RegexMatcher, use the Extract Date&Time Fields node and then filter, etc. I prefer to use the Palladian Regex Extractor and get the year via ([.])([2][0][2][0-9]{1})

See WF:
Regex date finder - extractor.knwf (22.7 KB)

Hope this helps!

7 Likes

You can also check out various examples of how to use Regex in KNIME:

4 Likes

As much as I love regex, you could also extract the year and filter 202*

2 Likes

I had the same thought as @Daniel_Weikert .

@IMR2KA is there any reason why you absolutely need to do this via Regex? What you requesting is relatively easy to come up with the proper Regex, so I’m assuming that you are not comfortable with Regex.

There are different ways to accomplish this without Regex, so that’s why I’m asking why do you need to do this via Regex?

2 Likes

Hello,

thank you all for the answers.

I have many similar(same name and number of columns) excel files in the sharepoint which are updated daily. Each file has a Column “Date”, but the problem is, that sometimes some files can have instead of normal date view (e.g 05.05.2022), Integers view (e.g ‘44719’).
My final task is to concatenate all these files.
So, my steps are:

  1. I concatenate all excel files, but some rows have integers view in the column date.
    Date
    05.05.2022
    44719
    05.07.2022
    07.03.2022
    44721

  2. I want to split my combined excel file into two excel files by using Column ‘Date’. If a row has “.202*” then it is one file, if not than it is another file.
    In such way I want to have in one file normal view in the column “Date” (e.g 05.05.2022) and in the other file I will have integer view in the column “Date” (e.g 44719 )

  3. Second file will be changed in order not to have Integer view in Column Date, but normal view. 44719 → “2022-06-07”

  4. I will combine First file with normal “date” view with the SECOND, but already changed one.

That is my logic how to solve this problem.
Please ask me, if something is not clear.

Hello,

thank you for the answer. I did not try your approach yet.
If you would like to have more details regarding my issue, please read my previous message above.

Thank you one more time!

Best regards,
Ram

Hello @IMR2KA

You can use a ‘Regex Split’ node:
\d+\.\d+(\.\d+)

image

I would suggest a ‘Rule-based Row Splitter’ afterwards:
MISSING $split_0$ => TRUE

BR

3 Likes

@IMR2KA
As @bruno29a suggested; a standard Cell Splitter with a dot as delimiter can do the same job. It isn’t so clean, but it can be used as well.

BR

1 Like

Hi @IMR2KA , how many formats, date or other, do you have in that column?

Just for dates, you are talking about 05.05.2022 and 2022-06-07.

Also, are your dates restricted to the year 202x ? In your sample data, you reached 2025. Is there anything beyond 2029, in which case your logic of .202* will not be applicable.

Can you share all the values of the date column and clarify what should be expected as data? We can throw many regex expressions at you, but they would be useless if we don’t know what else to expect.

If the correct format is always using mm dd and yyyy with a separator in between, regardless of the order or the separator, then you can rely on the length of the value, which should amount to 10, provided that the “invalid” ones, that is the integer ones, are less than 10 digits (like 44719, 44721).

You can use the function length() from String Manipulation:
length($Date$)

Then use Row Splitter and split on length = 10.

At the end of the day, any rule you want to apply is derived based on the patterns of the values of the column.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.