Split string sequence with Regex Split


I have this kind of protein sequence: YLLLEPYFAVWY in a column.

I would like to have each amino acid (letter) in distinct columns using the Regex Split node.

Y L L L ...

Unfortunately, I do not know the regex that does that.

With ([A-Z]{1})(.*$) I get:


which is nearly what I need, nearly...

Does anyone know the right regex?

Thank you


Wow, not easy. As I understand, regex cannot create automatic capturing groups, which is what you want. You will need to do ([A-Z]) repeatedly, finishing off with a .*

 If someone knows a way, please share. Hardly ideal. A better way is...

Use string replacer node, choose regex pattern and put in ([A-Z])

In the replacement text, put $1, This will now separate all letters by a comma. Now use a cell splitter node, and use , as the delimiter. And specify to replace all occurrences.

Job done.

That caused me a lot of head scratching.


Hello Simon,

This is wonderful! Thank you very much!

However, for those who would read us, the pattern is ([A-Z]){1}, and the replacement text is well $1, .

It seems this head scratching exhausted you ;)


2 posts were split to a new topic: RegEx Split Question