Can you assist how to extract the below row with only username and date using string manipulation?
email id - one column
date (dd/mm/yyyy) - one column
10.x.x.x - User1@outlook.com [01/May/2022:00:03:00 +0000] “GET/knime/rest/v4/repository/Users/User1@outlook.com/:data HTTP/1.1” 200 791089
Tried the below but gives only username.
regexReplace($Column$, “[0-9a-zA-Z:\.]+ [^ ]+ ([^ ]+) .*”, “$1”)
Regards
Narayanan.
Hello @narayanan
You can test this code:
substr($text$, 1, 11)
Alternatively:
regexReplace($text$, "\\[(.*?)[:].*$", "$1")
BR
It’s a little unclear what you’re trying to do, but I think the RegEx Extractor node can work:
I used the expression
(?<User>[A-Z0-9._%+-]+)@.*
(?<Date>[0-9]{2}/[A-Za-z]{3}/[0-9]{4})
and this is the result
1 Like
Thanks BR for the suggestion. Tried the expression but gets the output as like below
Expression → regexReplace($Column$, “\[(.?)[:]. $”, “$1”)
Output (New column) → 10.x.x.x - User1@outlook.com 01/May/2022
I expect the output Only the email ID User1@outlook.com in One column and the Date 01/May.2023 in another column
Regards
Narayanan.
Thanks elsamuel for the reply. Looks Good
Exactly I require the same output which you shown in the screenshot but two things.
Unable to see Node “Regex Extractor” in the Node repository to use (KNIME AP version 4.6.3)
Can we get email id along in the same output (in additional column)
Regards
Narayanan.
Hello @narayanan
Sorry, I miss interpreted the post. You can try the following:
regexReplace($text$, "^.*?[\\s-]+?(\\S+.com).*\\[(\\S+?)[:].*$", "$1|$2")
You can use a Cell Splitter using pipe as delimiter; alternatively Regex Split:
(.+)[|](.+)
BR
Thanks BR for the expression. Works
seems its not segregating if the user email ID is Case-sensitive
User1@OUTLOOK.COM .
Can we add the expression to detect .COM also for segregation or disable Case sensitive, so that detects both .com & .COM
Please suggest.
Regards
Narayanan.
Hello @narayanan
Aiming to make the code as no case sensitive; you just have to add operator ‘(?i)’ at the beginning of the code.
regexReplace($text$, "(?i)^.*?[\\s-]+?(\\S+.com).*\\[(\\S+?)[:].*$", "$1|$2")
Best Regards
Gordon
Thank you Gordon. Now works and get the expected output of both .com & .COM
Last one, just noticed the localuser (without domain) are not getting segregated in the string manipulation output. Can we able to include that along with that please?
10.x.x.x - vmadmin [09/May/2023:09:48:23 +0000] “PUT /knime/rest/v4/ostore/00000000-0000-0000-0000-000000000000/logs-WlWAd7oDmIwXNTGBR2QM/14a0fcd3-5579-46b3-8f58-bf6397f77e29@Server2/knime.log HTTP/1.1” 201 -
Regards
Narayanan.
@narayanan
Almost the same, removing the ‘.com’ condition at the end of the sequence. You won’t even need the ‘ignore case’ operator, as it is selecting by position: the the latest character sequence (no spaces) before the square bracket.
regexReplace($text$, "^.*?[\\s-]+(\\S+).*\\[(\\S+?)[:].*$", "$1|$2")
BR
1 Like
Hi Gordan,
Thank you so much for your help. Got the expected output now.
Regards
Narayanan.
1 Like
system
Closed
June 1, 2023, 5:42pm
12
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.