Feature request: column names from named groups in regex

Hi,

it would be cool if named groups in a regex, e.g. in the column rename (regex) node, automatically defined the names of the columns.

 

Best,

Lorenz

In the case of the Column Rename (Regex) node you can already use capture groups. You have to reference the captured groups in the replacement, though. Automatically using the first capture groups as replacement if no replacement is specified will be confusing and would save you exactly two characters. Not worth it in my opinion.

Which other nodes do you have in mind?

Thanks for the idea. I'll try to incorporate this to the soon to be partly resurrected HiTS Unpivot node. (At the time of writing of that node, named groups in regexes were not supported in Java as I remember. It is time to improve it. :-) )

Though something like Captures in .NET would be very useful here too.

My fault, I was thinking of the Regex Split node. Its output columns are currently always named split_n.

were you able to work on the named groups for the regex split node already?

any update on that?

The Regex Split node doesn't seem to support named groups (yet?). It seems to accept syntax like

(?<name>pattern)

but keeps outputting column names like "split_0", "split_1" etc. In addition, when mixing named and unnamed groups, some capture groups appear to get lost, i.e. the result contains less columns than there are capture groups.

Any chance that this behavior is going to change?

 

Currently I'm using Regex Split with a subsequent Column Rename to get what I need, but using named groups would certainly simplify this.

 

Cheers,

Jan

 

Hi KNIME team,

any update on this?

Thx, Jan

 

I want to bump this thread. I use "Regex Split" all the time and it always needs to be followed up with a rename node. This works, but named groups would make this obsolete and much cleaner. I've tried to use named groups to automatically name the output columns, but I havne't been able to find a syntax that works. Has anyone had any luck with this since the question was first posed 3 years ago?

Thanks.

Still not working in an up-to-date KNIME installation.

@KNIME team: is this something you consider changing?

 

The Workflow Coach gives a correlation of 19% for a Regex Split node to be followed by a Column Rename node, which is one of the highest percentages I observed for the yellow manipulator nodes (empirically):

image

I think this would warrant implementing named regex groups in the Regex Split node, no?
What do KNIME team members think?

2 Likes