Regex in KNIME does not recognize whitespaces in strings

Hi,

I have a problem with Regex within KNIME and it seems to be related to KNIME or my misunderstanding of it. I am new to KNIME but already sitting a while at this problem:

Basically I want to read PDFs and apply some Regex to filter out some data. The data is surrounded by certain combination of words, so I need to look for a combination of words which contain whitespaces. Problem is, that I am not able to get a working regular expression on a combination of words which contains white spaces. I tried the regex using a webbased tool for testing regexs, where it works fine, but in KNIME I am not able to get it working. My Testworkflow is:

PDF Parser -> Document Data Extractor --> String Manipulation

Example Text extracted in Document body text: “Das ist ein Test.”
Expression: regexReplace($Document body text$,"(Das ist)",“replacementtext”)

When the regex does not contain white spaces, it works:
regexReplace($Document body text$,“Das”,“replacementtext”)

I tried to find out the encoding of the pdf by copying it out in Adobe Reader and Document Viewer using external tools. Even though I am not 100% sure it seems to be UTF-8 so I changed in the PDF Parser the charset accordingly. However it did not worked. I get the same behaviour when I use the Document Viewer and the search field.

Can anybody help me with this?

Thanks
Martin

Hi Martin,

are you sure your whitespaces are actually whitespaces and not some kind of tabs? Try pasting your string on this website.

Or you could try to match your “whitespace” with a . which is the regex equivalent of any character.
Cheers,
Johannes

2 Likes