I just started using KNIME for a project and I’m having some issues for that I hope to get some help.
I used the file Reader to read in a JSON table, that contains HTML tags etc. from earlier parsed websites.
I want to use a RegEx on the table to just get the information needed.
However, using the Java Snippet my code won’t run and I keep getting the error on execution “Execute failed: java.lang.NullPointerException”.
Regarding the table, there is just one column (c_Column in the code as input variable with the JAVA Type String) with a few thousand rows, each row representing a data entry with its information.
Now my code looks currently like this
import java.util.regex.*;
// Your custom variables:
Pattern p = Pattern.compile("\"data\".{4}\"id\".{2}[0-9]*.{2}\"goal\".{2}[0-9]*|\"state\".{3}[a-z]*.{3}\"country\".{2}\"[a-z/A-Z]*\"|\"category\".{3}\"id\".{2}[0-9]*.{2}\"name\".{2}\"[a-z/A-Z/ ]*\"");
Matcher m = p.matcher(c_Column);
if(m.find()) {
System.out.println("Expression found.");
System.out.println(m.group());
}
Now nevermind the long regex for now.
For me, the question is more, whether I’m making a general mistake for the input and output of the Java Snippet Node. Or general, if there is a better way to solve this problem (using different nodes) too?
In addition to what @aborg already mentioned with c_Column being null for missing values in the input table: there is a “Regex Split” node, which may do exactly what you want
I am facing a very similar issue: if I execute a regex replace with the regex code passed by a variable, the regex is not executed as soon as it is a bit sophisticated (cannot use capturing groups for instance, or end of string token).
The issue is not the regex itself, if I hardcode it directly like in a string manipulation node: RegexReplace($Sentence$,“s(\d) ?-?(\d)”,“size$1to$2”)
, it works fine.
If I store “s(\d) ?-?(\d)” as a string variable, like this: regexReplace($Sentence$,$${Sregex}$$,$${Sreplacer}$$). Then it does not work.
I face the same issue when calling the regex expression from a column in my table rather than a variable.
I tried as well using the ‘Replacer’ node from the Text Processing extension: same results.
Basically, unless I type the regex within the RegexReplace function I cannot use any advanced regex expressions.
What is the best approach to apply a series of regex replace to a string column (or document)? This is for sure a very classic need in text processing, yet I could not find the answer within the forum.
Thank you in advance for the help.
Regards,
Pierre