String Manipulation and Replacer Nodes

esLivingston · July 10, 2016, 5:16pm

The backreference replacement for a regular expression in the String Replacer and String Manipulation nodes is not working as expected.

My table has a column that contains strings of the format...

XX99 9XXXxxxxxx xxxxxx (e.g. "G20 7ETBearsden", "EH14 7DGWest Lothian", etc)

and also, more simply,

XX99 9XX, XX9 99XX, or X9 99XX

The strings needs to be transformed so that all are of the simples formats (i.e strip off any Xxxxx xxxx characters at the end of the string)..

The regular expression patterns that I have tried (and shown to be correct in the OS X Utility RegExhibit) are

"([A-Z]+\d+ \d+[A-Z]{2})(.*)$"

and

"[A-Z][a-z ]*$"

Using the first pattern in the expression

regexReplace($Unique concatenate$,"([A-Z]+\d+ \d+[A-Z]{2})(.*)$","$1")

does nothing (i.e. the appended column contains exactly the same content as the input table cell).

Similarly the second pattern is the the expression

regexReplace($Unique concatenate$,"[A-Z][a-z ]+$","")

does nothing.

Similar results are obtained using the String Replacer Node.

Where am I going wrong?

TIA

Eric

phalassek · July 11, 2016, 1:43pm

Just a wild guess, but I think the $ is to much. It stands for the end of the string. Have you tested the regex elsewhere? I noramlly use http://regex101.com/ where you get visual aid and you can switch between dialects.

esLivingston · July 11, 2016, 2:23pm

Further examination of the data (a by-product of an earlier node in my workflow) that I am trying to modify would seem to indicate that a null character has been introduced between every character in the strings being processed.

Therefore, it is no wonder that the regular expressions are not being matched and the desired result being obtained!

The question for me now is what has introduced the null characters and why.

However, there does still seem to be an problem with the String Manipulation node in that when manually entered data is used as input (i.e. the same strings but without the embedded null characters) it still fails to preform the change. Whereas the String Replacer node does work in this situation.

esLivingston · July 11, 2016, 2:31pm

Thank you for your response, phalassek.

Yes, as I mentioned in the entry, I tested both the regular expressions in the OS X utility RegExhibit and it showed that the patterns were correctly identified in test data.

I like the regular expression website that you provided. It is really useful. Accordingly, I have tweeked one of my regular expressions to become...

([A-Z]{1,2}\d{1,2} d{1,2}[A-Z]{2})

esLivingston · July 11, 2016, 8:09pm

I have managed to get past this problem by introducing another String Replacer node that uses a regular expression of \000 (i.e. octal for a null char) to delete all occurences of null chars in the strings.