Remove every string before certain words (String Manipulation) using RegexReplace

Hello

I wanna ask how can we string manipulate these column into returning value as detailed below

Value :

  • C:\Users\ASUS\OneDrive - Bina Nusantara\Desktop\Input Join SPK DO PSPK\2. Data DO Closing\1. Laporan Penjualan Closing Januari 2024.xlsx
  • C:\Users\ASUS\OneDrive - Bina Nusantara\Desktop\Input Join SPK DO PSPK\2. Data DO Closing\1. Laporan Penjualan Closing Januari 2024.xlsx
  • C:\Users\ASUS\OneDrive - Bina Nusantara\Desktop\Input Join SPK DO PSPK\2. Data DO Closing\1. Laporan Penjualan Closing Januari 2024.xlsx

Aimed :

  • Laporan Penjualan Closing Januari 2024.xlsx
  • Laporan Penjualan Closing Januari 2024.xlsx
  • Laporan Penjualan Closing Januari 2024.xlsx

I’m using RegexReplace inside the String Manipulation nodes
So I wanted to remove every value before the word “Laporan”

1 Like

Hi @kennywp

Welcome to KNIME Forum. I would go for the String Manipulation node, and use:
substr($column1$, indexOf($column1$,"Laporan" ))

gr. Hans

4 Likes

Woahh, been looking for solution all day long.

Thanks a lot! This is fantastic and works like a charm.

1 Like

Another request: Is it possible to use regexreplace? Can you provide the syntax if it’s possible?

Hi @kennywp, it depends on whether the “rules” remain constant with different data.

If I can assume that the file name part of the path always starts with a number followed by a period and potentially one or more whitespace, and then you want to capture everything after that, then this could work:

regexReplace($column1$,".*\\\\[0-9]+\\.\\s*(.*$?)","$1")

The actual regex pattern to match (without additional “escaping backslashes” required by String Manipulation) would look like this:

.*\\[0-9]+\.\s*(.*?)$

which means:

regex pattern written in String Manipulation as meaning
.* any number of characters (“greedy” as many as can be consumed while still matching overall pattern)
followed by
\\ \\\\ a single backslash
followed by
[0-9]+ one or more digits
followed by
\. \\. a period
followed by
\s* \\s* any number (or no) whitespace characters
followed by
( capture group 1
.*? Any number of characters (? = “(non-greedy) as few as can be consumed that still matches overall pattern”)
) end of capture group 1
followed by
$ end of string

This then gets replaced by “$1” which means the text found in the first “capture group”

You will see that in the regex pattern given to String Manipulation, every \ in the actual regex needs to be prefixed by an additional \, in order for it to be interpreted correctly and passed on to regex by the node.

Alternatively, if it is always from “Laporan” onwards, after the final backslash, you could use this:

regexReplace($column1$,".*\\\\[0-9]+\\..*(Laporan.*?)$","$1")

or to capture from the first occurrence of “Laporan” onwards, regardless of backslashes:

regexReplace($column1$,".*?(Laporan.*)$","$1")

or from the last occurrence (change which “.*” is “non-greedy”, by moving the “?” )
regexReplace($column1$,".*(Laporan.*?)$","$1")

3 Likes

As many ways lead to Rome br

3 Likes

Thanks for answering. Works like a charm! :grinning: :smiley:

Thank you for responding. It did a wonderful job! :grin: :smiling_face:

May I ask another cases,

Value :

  • MITSUBISHI/TRITON 2.5L SC HDX-L (4X4) M/T
  • MITSUBISHI/L300 PU FB-R (4X2) M/T
  • MITSUBISHI/L300 PU FB-R (4X2) M/T
  • MITSUBISHI/TRITON 2.4L DC EXCEED (4X4) M/T

AIMED :

  • TRITON 2.5L SC HDX-L (4X4) M/T
  • L300 PU FB-R (4X2) M/T
  • L300 PU FB-R (4X2) M/T
  • TRITON 2.4L DC EXCEED (4X4) M/T

So the target is to take everything after the first “/” character

Hi @kennywp

First a remark . Next time I would advice you to start a new topic, instead of posting a new/different question to a topic that is already marked as solved.

In this case, go for the Cell Splitter with the "" as delimeter , and Set array size to 2.


gr. Hans

2 Likes

Thanks @HansS, I like that solution, as I think I have never used the “arraysize” before, but in this case it is clearly very important to limit the splits :slight_smile:

@kennywp , alternatively, regex solutions (spoilt for choice with this one :wink: ) could be:

Regex Split
.*?/(.*)
image

String Manipulation
regexReplace($column1$,".*?/(.*)","$1")

String Replacer
.*?/(.*) → $1

They are all very similar :-). You will notice the use of the question mark in .*?/ which reduces the characters found before the “/” to the minimum possible, thereby ensuring it splits at the first “/”. Without the “?”, it would split at the last “/”.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.