Replace or remove a portion of string with a regex

Hi collegues,

I have this string composed by hour, minute and seconds, here an example: 17:24:55

And I would like to remove the part after the : character and maintain only the hour part, as in this result output: 17

So I have thought about using the string replacer node or the string manipulation node, but I don't have a clear idea of how to build a regex for remove or replace every characters after the :

Someone has an idea? Thanks in advice.

 

1 Like

Hi,

If you are working with times, then I would recommend using the Date&Time nodes instead.  However, regarding regex approaches - there are numerous ways to do this in KNIME.  The attached workflow gives an example of using the Date&Time nodes, as well as using regex in the String Replacer node or the Regex Split node.

Kind regards

James

2 Likes

String Replacer Node should work fine.

If you use the "regular expression" setting and enter:

^(\d{2})\:.*

as a pattern. The braces around the \d{2} (searching for two digits at the beginning of the string)
define the pattern as a capturing group. Everything inside a capturing group can be extracted by a back reference.

If you enter $1 (which is the back reference to the braces) into field Replacement Text, you should keep only the
hour part of the string. Hope that helps.

Thank you MH@B,

your solution works perfectly!

Hi James, thank you for your reply and for your help,

unfortunately I cannot download the file with that extension... did you try to download it too for test if the link works correctly?

-Giulio

Hi there. Similar problem here but not related to date/time.
I’d like to remove the characters following a square close bracket.

For instance, I have this:
[PP13, 112, W, ?, ?]#641

and would like to end up with:
[PP13, 112, W, ?, ?]

I cant just drop the last 4 characters as the amount of characters following the “]” varies between rows.

I thought I could do it with an asterisk as wild card following a square bracket but it just takes out the bracket and leaves everything else:
image

So that’s a no-go.

Hope this isn’t threadjacking, just thought the right experience was already in this thread.

Hi there @Goldeneye83,

you can use following expression in String Manipulation node using substr() function:

substr($column1$, 0, indexOf($column1$, "]") + 1)

To use regex this one should work fine:

regexReplace($column1$, "[^\\]]*$" ,"")

Br,
Ivan

5 Likes

Thanks @ipazin that should work like a charm

Hi everyone!

I have a similar problem, but it has been impossible to solve even the solutions mentioned here.

I’d like to exclude the last 3 characters from the character “B” considering the amount of characteres BEFORE the “B” varies in each rows. For example:

Data Set

78957389BP1
789573890BP1
789573BP3

Aim:

78957389
789573890
789573

Could you please help me!?

Many thanks.

Hello @andreluis,

and welcome to KNIME Community!

Here are two options that might work for you:

  • remove last 3 characters from a string

substr($col_name$, 0, length($col_name$) - 3)

  • remove everything after (last) character B in a string:

regexReplace($col_name$, "(.*)B.*" , "$1")

Br,
Ivan

2 Likes

Many thanks!!! @ipazin

1 Like

Hi, I have the following data set

8/3/21 8:15 PM

I need to keep the date only and get rid of the rest, as follows:

8/3/21

I tried a String manipulation

strip(substr($Resolution TimeStamp$,0,7))

But some dates have an additional digit like 10/10/21 and if I increase the amount of digits on the string manipulation I end up bringing an additional digit for the ones that have only 6 digits on the date.

Any ideas?

Hi @andrea_ramirez
Welcome to KNIME Forum!

If you extend your String Manipulation with the index function to identify the occurrence of the first " " (whitespace) your are done.
substr($column1$, 0 , indexOf($column1$," " ) )
gr. Hans

3 Likes

Hi @andrea_ramirez,

Welcome to the community. I know that this thread was about String Manipulation, but given its age, it probably would be best in future to open a new question, whilst maybe referring to the old post if you consider it heavily related.

Re your specific question, whilst you can do this with string manipulation, in general converting string data from one date/time format to another, such is what you are doing here, the safest option which doesn’t require any coding is to convert the string to date and time using the String to Date&time node, specifying the format of your existing data, and then convert back from date/time to string using the Date&time to String node specifying the required new format.

So here you would convert to Datetime using a format such as:

d/M/yy h:mm a

(note the case of the letters used here as this is important to their meaning)

and then convert back to String using the format:

d/M/yy

One caveat and slight complication here is that the accepted capitalisation of the AM/PM has changed (possibly locale dependant) with the latest versions of java, and therefore can trip you up. For further info if this causes you problems, see this thread…

6 Likes

Hello @andrea_ramirez,

there is also Modify Time node which can append, change and remove time from Date&Time column.

Welcome to Community!

Br,
Ivan

4 Likes

This totally worked! thank you so much … I’ll keep moving on with my flow and get back for help if needed!

3 Likes

regexReplace($Name$, “[+^\[+Disabled+^\]]”, “”)

Hello there,

sorry about reopening this.

I am trying to remove the following simple expression from hundreds of rows (where found):

[Disabled] car apple train

I was almost able to do it with the following expression:

regexReplace($Name$, “[+^\[+Disabled+^\]]”, “”)

** car apple train**

but left me with the two spaces

Then I got this expression from RegExr: Learn, Build, & Test RegEx where it matched perfectly, on the site only:

< [Disabled]*\s\s > and added to expression:

regexReplace($Name$, "[Disabled ]\s\s", “”)*

and nothing has changed.

What am I doing wrong? Why can’t the regex shown on the regex site that matches work on Knime?

@ipazin

Thanks in advance.
J.

Not sure why you have 2 spaces but this should work fine
[Disabled]\s+
Be aware that some knime nodes need double backslashes when configuring.
br

Hi @Daniel_Weikert , thanks for your reply

this **


**
didn’t work. Result:

[Dabled] car apple train

removing the quotes, errored and didn’t let me even close the window.

Rectifying my last post when I mentioned this:

regexReplace($Name$, “[+^\[+Disabled+^\]]”, “”) had partially worked

It did not. In fact, it made worse, it went from this:

image

to this:

image

One last thing. I am super confused that it seems regex from any website I can match the expression I am trying to remove, when passing it over to Knime, it does not behave the same. In fact, doesn’t even like it at all. What is the difference between regex ‘versions’ if that’s even a thing?

You need to escape your regex in the string manipulation node that way

res

br

2 Likes