string manipulation

Can you assist how to extract the below row with only username and date using string manipulation?
email id - one column
date (dd/mm/yyyy) - one column

10.x.x.x - User1@outlook.com [01/May/2022:00:03:00 +0000] “GET/knime/rest/v4/repository/Users/User1@outlook.com/:data HTTP/1.1” 200 791089

Tried the below but gives only username.
regexReplace($Column$, “[0-9a-zA-Z:\.]+ [^ ]+ ([^ ]+) .*”, “$1”)

Regards
Narayanan.

Hello @narayanan
You can test this code:

substr($text$, 1, 11)

Alternatively:

regexReplace($text$, "\\[(.*?)[:].*$", "$1")

BR

It’s a little unclear what you’re trying to do, but I think the RegEx Extractor node can work:

I used the expression

(?<User>[A-Z0-9._%+-]+)@.*
(?<Date>[0-9]{2}/[A-Za-z]{3}/[0-9]{4})

and this is the result

1 Like

Thanks BR for the suggestion. Tried the expression but gets the output as like below

Expression → regexReplace($Column$, “\[(.?)[:].$”, “$1”)

Output (New column) → 10.x.x.x - User1@outlook.com 01/May/2022

I expect the output Only the email ID User1@outlook.com in One column and the Date 01/May.2023 in another column

Regards
Narayanan.

Thanks elsamuel for the reply. Looks Good

Exactly I require the same output which you shown in the screenshot but two things.

  1. Unable to see Node “Regex Extractor” in the Node repository to use (KNIME AP version 4.6.3)
  2. Can we get email id along in the same output (in additional column)

Regards
Narayanan.

Hello @narayanan
Sorry, I miss interpreted the post. You can try the following:

regexReplace($text$, "^.*?[\\s-]+?(\\S+.com).*\\[(\\S+?)[:].*$", "$1|$2")

You can use a Cell Splitter using pipe as delimiter; alternatively Regex Split:

(.+)[|](.+)

BR

Thanks BR for the expression. Works

seems its not segregating if the user email ID is Case-sensitive

User1@OUTLOOK.COM.

Can we add the expression to detect .COM also for segregation or disable Case sensitive, so that detects both .com & .COM

Please suggest.

Regards
Narayanan.

Hello @narayanan
Aiming to make the code as no case sensitive; you just have to add operator ‘(?i)’ at the beginning of the code.

regexReplace($text$, "(?i)^.*?[\\s-]+?(\\S+.com).*\\[(\\S+?)[:].*$", "$1|$2")

Best Regards
Gordon

Thank you Gordon. Now works and get the expected output of both .com & .COM

Last one, just noticed the localuser (without domain) are not getting segregated in the string manipulation output. Can we able to include that along with that please?

10.x.x.x - vmadmin [09/May/2023:09:48:23 +0000] “PUT /knime/rest/v4/ostore/00000000-0000-0000-0000-000000000000/logs-WlWAd7oDmIwXNTGBR2QM/14a0fcd3-5579-46b3-8f58-bf6397f77e29@Server2/knime.log HTTP/1.1” 201 -

Regards
Narayanan.

@narayanan
Almost the same, removing the ‘.com’ condition at the end of the sequence. You won’t even need the ‘ignore case’ operator, as it is selecting by position: the the latest character sequence (no spaces) before the square bracket.

regexReplace($text$, "^.*?[\\s-]+(\\S+).*\\[(\\S+?)[:].*$", "$1|$2")

BR

1 Like

Hi Gordan,

Thank you so much for your help. Got the expected output now.

Regards
Narayanan.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.