Remove every string before certain words (String Manipulation) using RegexReplace

kennywp · June 4, 2024, 3:32am

Hello

I wanna ask how can we string manipulate these column into returning value as detailed below

Value :

C:\Users\ASUS\OneDrive - Bina Nusantara\Desktop\Input Join SPK DO PSPK\2. Data DO Closing\1. Laporan Penjualan Closing Januari 2024.xlsx
C:\Users\ASUS\OneDrive - Bina Nusantara\Desktop\Input Join SPK DO PSPK\2. Data DO Closing\1. Laporan Penjualan Closing Januari 2024.xlsx
C:\Users\ASUS\OneDrive - Bina Nusantara\Desktop\Input Join SPK DO PSPK\2. Data DO Closing\1. Laporan Penjualan Closing Januari 2024.xlsx

Aimed :

Laporan Penjualan Closing Januari 2024.xlsx
Laporan Penjualan Closing Januari 2024.xlsx
Laporan Penjualan Closing Januari 2024.xlsx

I’m using RegexReplace inside the String Manipulation nodes
So I wanted to remove every value before the word “Laporan”

HansS · June 4, 2024, 5:18am

Hi @kennywp

Welcome to KNIME Forum. I would go for the String Manipulation node, and use:
substr($column1$, indexOf($column1$,"Laporan" ))

gr. Hans

kennywp · June 4, 2024, 6:38am

Woahh, been looking for solution all day long.

Thanks a lot! This is fantastic and works like a charm.

kennywp · June 4, 2024, 7:56am

Another request: Is it possible to use regexreplace? Can you provide the syntax if it’s possible?

takbb · June 4, 2024, 12:14pm

Hi @kennywp, it depends on whether the “rules” remain constant with different data.

If I can assume that the file name part of the path always starts with a number followed by a period and potentially one or more whitespace, and then you want to capture everything after that, then this could work:

regexReplace($column1$,".*\\\\[0-9]+\\.\\s*(.*$?)","$1")

The actual regex pattern to match (without additional “escaping backslashes” required by String Manipulation) would look like this:

.*\\[0-9]+\.\s*(.*?)$

which means:

regex pattern	written in String Manipulation as	meaning
`.*`		any number of characters (“greedy” as many as can be consumed while still matching overall pattern)
		followed by
`\\`	`\\\\`	a single backslash
		followed by
`[0-9]+`		one or more digits
		followed by
`\.`	`\\.`	a period
		followed by
`\s*`	`\\s*`	any number (or no) whitespace characters
		followed by
`(`		capture group 1
`.*?`		Any number of characters (`?` = “(non-greedy) as few as can be consumed that still matches overall pattern”)
`)`		end of capture group 1
		followed by
`$`		end of string

This then gets replaced by “$1” which means the text found in the first “capture group”

You will see that in the regex pattern given to String Manipulation, every \ in the actual regex needs to be prefixed by an additional \, in order for it to be interpreted correctly and passed on to regex by the node.

Alternatively, if it is always from “Laporan” onwards, after the final backslash, you could use this:

regexReplace($column1$,".*\\\\[0-9]+\\..*(Laporan.*?)$","$1")

or to capture from the first occurrence of “Laporan” onwards, regardless of backslashes:

regexReplace($column1$,".*?(Laporan.*)$","$1")

or from the last occurrence (change which “.*” is “non-greedy”, by moving the “?” )
regexReplace($column1$,".*(Laporan.*?)$","$1")

Daniel_Weikert · June 4, 2024, 3:24pm

As many ways lead to Rome br

kennywp · June 5, 2024, 7:44am

Thanks for answering. Works like a charm!

kennywp · June 5, 2024, 7:46am

Thank you for responding. It did a wonderful job!

kennywp · June 10, 2024, 4:43am

May I ask another cases,

Value :

MITSUBISHI/TRITON 2.5L SC HDX-L (4X4) M/T
MITSUBISHI/L300 PU FB-R (4X2) M/T
MITSUBISHI/L300 PU FB-R (4X2) M/T
MITSUBISHI/TRITON 2.4L DC EXCEED (4X4) M/T

AIMED :

TRITON 2.5L SC HDX-L (4X4) M/T
L300 PU FB-R (4X2) M/T
L300 PU FB-R (4X2) M/T
TRITON 2.4L DC EXCEED (4X4) M/T

So the target is to take everything after the first “/” character

HansS · June 10, 2024, 6:34am

Hi @kennywp

First a remark . Next time I would advice you to start a new topic, instead of posting a new/different question to a topic that is already marked as solved.

In this case, go for the Cell Splitter with the "" as delimeter , and Set array size to 2.

gr. Hans

takbb · June 11, 2024, 7:14am

Thanks @HansS, I like that solution, as I think I have never used the “arraysize” before, but in this case it is clearly very important to limit the splits

@kennywp , alternatively, regex solutions (spoilt for choice with this one ) could be:

Regex Split
.*?/(.*)

String Manipulation
regexReplace($column1$,".*?/(.*)","$1")

String Replacer
.*?/(.*) → $1

They are all very similar :-). You will notice the use of the question mark in .*?/ which reduces the characters found before the “/” to the minimum possible, thereby ensuring it splits at the first “/”. Without the “?”, it would split at the last “/”.

system · June 18, 2024, 7:15am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.