Java Snippet for Regex usage not working

Hello together,

I just started using KNIME for a project and I’m having some issues for that I hope to get some help.
I used the file Reader to read in a JSON table, that contains HTML tags etc. from earlier parsed websites.
I want to use a RegEx on the table to just get the information needed.
However, using the Java Snippet my code won’t run and I keep getting the error on execution “Execute failed: java.lang.NullPointerException”.
Regarding the table, there is just one column (c_Column in the code as input variable with the JAVA Type String) with a few thousand rows, each row representing a data entry with its information.

Now my code looks currently like this

import java.util.regex.*;

// Your custom variables:
 
Pattern p = Pattern.compile("\"data\".{4}\"id\".{2}[0-9]*.{2}\"goal\".{2}[0-9]*|\"state\".{3}[a-z]*.{3}\"country\".{2}\"[a-z/A-Z]*\"|\"category\".{3}\"id\".{2}[0-9]*.{2}\"name\".{2}\"[a-z/A-Z/ ]*\"");

Matcher m = p.matcher(c_Column);

if(m.find()) {
    System.out.println("Expression found.");
    System.out.println(m.group());
}

Now nevermind the long regex for now.
For me, the question is more, whether I’m making a general mistake for the input and output of the Java Snippet Node. Or general, if there is a better way to solve this problem (using different nodes) too?

Regards

Johanna

Hi Johanna,

You probably want to test c_Column for nullness as KNIME’s missing values are encoded as null in the Java snippets.

3 Likes

Hi Johanna!

In addition to what @aborg already mentioned with c_Column being null for missing values in the input table: there is a “Regex Split” node, which may do exactly what you want :smiley:

Cheers, Jonathan.

3 Likes

Hi,

I am facing a very similar issue: if I execute a regex replace with the regex code passed by a variable, the regex is not executed as soon as it is a bit sophisticated (cannot use capturing groups for instance, or end of string token).

The issue is not the regex itself, if I hardcode it directly like in a string manipulation node:
RegexReplace($Sentence$,“s(\d) ?-?(\d)”,“size$1to$2”)
, it works fine.

If I store “s(\d) ?-?(\d)” as a string variable, like this:
regexReplace($Sentence$,$${Sregex}$$,$${Sreplacer}$$). Then it does not work.

I face the same issue when calling the regex expression from a column in my table rather than a variable.
I tried as well using the ‘Replacer’ node from the Text Processing extension: same results.

Basically, unless I type the regex within the RegexReplace function I cannot use any advanced regex expressions.

What is the best approach to apply a series of regex replace to a string column (or document)? This is for sure a very classic need in text processing, yet I could not find the answer within the forum.

Thank you in advance for the help.
Regards,
Pierre