Delete String over multiple Columns [Regex Syntax needed]

Hello there,

im not to deep into regex to get this thing working right.
So in my case im exporting article data writen in HTML. I splitted cells based on

but now the closing parent of this expression is still visible as “

”. And also I have the " " for linebreaks.

I tried on to get a expression working but i dont get the point how to get the <> characters in the expression working, as well as the composition of the excakt characters and digits.

As i will do this on multiple columns my version was:

regexReplace($$CURRENTCOLUMN$$,“[p]\w+\d{4}” ,“”)

But as expected this is far off from correct.
If anybody could give a detailed describtion on how the syntax of Regex is working would be very handsome. Because the Tips in KNIME give u an overall understanding but abc ABC is most likely not the usecase for this. All the research just made things worse to understand to be honest.

Thanks alot

Hi @Yannick_Jasper

It’s better to switch to Regex101. That has a build-in feature to generate the correct syntax for KNIME.

In Java, \d becomes \\d etc.

For your specific use-case, I’d say it’s better to share an example with input and expected output since it’s really difficult to provide a proper solution without seeing what you’re trying to achieve.

Hey ArjenEX,

i will have a look to that page. Thanks for that.

So my goal is to split the Cell “Artikeltext” (in every language tree) with the

as idication and remove the “< / p >” and “& n b s p ;” (needed the spaces to display them) from the created arrays.

Artikel.knar (840.7 KB)

Thanks a lot

You probably need sth like this

regexReplace($$CURRENTCOLUMN$$,"(<\\/?p>|&nbsp;)" ,"" )

but be aware that you first need to address empty cells (eg via missing value) and also that the string mulit only works on string columns


Hey @Daniel_Weikert ,

the synatx worked out perfectly.
Thanks for the help and the given link to dig deeper on my own :slight_smile:

