Unexpected match by Column Rename (Regex)

I have a column named 'foo1 foo2+bar1' (there is a space between foo1 and foo2).

If I use (\w+)\+(\w+) as regex in Column Rename (Regex), $1 gets 'foo1 foo2'.  I would only have expected 'foo2' .

Based on the node docs, there does not look to be any custom implementation of regexes that would explain this.

I've attached an example workflow.

I'm using 2.8.1 on Mountain Lion.

Hello, I would only expect "foo1" to match the first group and fail to match for the whole string.

Have you tried (?:\w+)\s+(\w+)\+(\w+) to match only "foo2" for $1?

Cheers, gabor

PS.: I hope I did not made mistakes, did not tested.

Gabor,

Thanks, but what I am really interested to know is why that regex returns what it does.

(\S+)\+(\w+) also results in $1 getting 'foo1 foo2'.

(\w+)\+(\w+) applied to 'foo1.foo2+foo3' results in $1 getting 'foo1.foo2'.

\w and \S do not act as they should.

Looks like a problem with how regexes are implemented in the node. 

Maybe one of the KNIME developers could comment.

Cheers,

Andrew

Hi Andrew,

I have checked the code of the column rename regex node. Here are the relevant parts:

Matcher m = searchPattern.matcher(oldName);
StringBuffer sb = new StringBuffer();
while(m.find()) {
try {
m.appendReplacement(sb, replace);
} catch (IndexOutOfBoundsException ex) {
throw new InvalidSettingsException(
"Error in replacement string: " + ex.getMessage(), ex);
}
}
m.appendTail(sb);

So anything that is not matched will be copied as it is. So I would recommend creating patterns that match the whole column name you would like to replace (like '^(?:\w+)\s+(\w+)\+(\w+)$'). If you try the '^(\w+)\+(\w+)' pattern, you will see it do not match. The '^(\w+).*' will give only foo1 for $1.

I guess the documentation could be improved, but to be honest I do not know how.

Hope this helps, gabor

 

PS.: To be clear... The 'foo1 ' part was not included because of $1, but because it was not matched. To confirm this, put something around $1, for example: '_$1_'.

Thank you, Gabor.  Interesting to know why this happens.