Removing variable number of variable substrings

I have peptide sequences that have modifications in them that I want to remove to make a particular type of comparison to another data source. For example I might have the following strings:
AAACEADER
AADFES(Phospho)STEYA
YM(Oxidation)S(Phospho)ASDCE

Where the modifications are always in parentheses though their descriptions are of variable length and type. A single peptide sequence could have zero, one, or many modifications. My goal is to remove all the modifications that are in the parentheses, and the parentheses as well. From those three then I would get:
AAACEADER
AADFESSTEYA
YMSASDCE

I’ve tried using String Manipulation but I can’t seem to get this quite right. It’s easy to replace a single parenthesis or a set of parentheses with a known string. It’s also easy to replace everything between an open and close parenthesis, but in the case of a peptide with multiple modifications I end up losing the sequence between the modifications. I tried String Replacer as well and had similar results.

Any suggestions on how to handle these strings would be great.

You can use a regex replace function to remove the text wrapped in parenthesis. Several nodes can do that but one example is the regexreplace() function in the String Manipulation node.

It sounds like you may have tried this approach. If you add the ? character in your regex you’ll select the smallest string for removal instead of removing everything between the first ( and last ). Saving those middle peptides in your 3rd example.

Give the expression below a try.

Remove Substrings.knwf (6.1 KB)

4 Likes

Excellent, thank you! I thought I had previously had success with a RegEx for this, but I couldn’t remember the correct syntax. This was exactly what I needed. Can you tell me why the double slash is used in the RegEx before the open and closed parentheses?

Glad it worked out for you!
The extra \ are to suppress the ( and ) so they are interpreted as the literal string characters instead of with their special regex meanings.

Gotcha. I was surprised to see it as a double slash instead of a single slash. Maybe I just did too much RegEx work in Perl, and it set me too much in the school of using single slashes.

All good haha, for such a short expression it sure has a lot going on! Sometimes you just need that second set of eyes.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.