Regex Split support for optional patterns?

Is it possible to use optional groups in patterns (with data matching, and present)?

For example:
(.+)(?: [([1-3])])?$ matches: something, but produces an error message: Execute failed: String value can’t be null.

Could it be a missing value? (Or a collection of StringCells?)

Stack trace:
java.lang.NullPointerException: String value can’t be null.
at org.knime.core.data.def.StringCell.(StringCell.java:91)
at org.knime.base.node.preproc.regexsplit.RegexSplitNodeModel$1.getCells(RegexSplitNodeModel.java:145)
at org.knime.core.data.container.RearrangeColumnsTable.create(RearrangeColumnsTable.java:325)
at org.knime.core.node.ExecutionContext.createColumnRearrangeTable(ExecutionContext.java:273)
at org.knime.base.node.preproc.regexsplit.RegexSplitNodeModel.execute(RegexSplitNodeModel.java:80)
at org.knime.core.node.NodeModel.execute(NodeModel.java:556)
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:410)
at org.knime.core.node.Node.execute(Node.java:653)
at org.knime.core.node.workflow.SingleNodeContainer.executeNode(SingleNodeContainer.java:587)
at org.knime.core.node.workflow.SingleNodeContainer.access$1(SingleNodeContainer.java:561)
at org.knime.core.node.workflow.SingleNodeContainer$1.run(SingleNodeContainer.java:446)
at org.knime.core.node.workflow.JobRunnable.run(JobRunnable.java:43)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:98)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:166)

It looks like the pattern like this: (.)+ is also not properly handled, as it produces only a g. (The non-grouping (?:, ) seems to be creating new columns, I am not sure how intentional it is.)

Thanks for the problem report. There are two problems with the node: (1) it creates columns for non-capturing groups and (2) it throws the reported NPE for optional groups. Both problems were fixed and will be available in v2.1.

I can’t see a problem with (.)+ matching only g though. The parentheses match one character, whereby g is the character the matched last. Maybe the documentation on java regexp helps here?

Thanks again!

You are really good. :slight_smile: And absolutely right. I have not checked properly the documentation. (I thought it is possible to get the individual groups from the match and those would be the values in the result column. My wrong.)
Thanks, gabor