Regex match help

I am having difficulty with the Regex match function under string manipulation. I have various postcodes and I want to flag those with the format “Z99 9ZZ” or “ZZ99 9ZZ” as TRUE and the others as FALSE. I have used RegExr to write this expression:

^[A-Z]{2}[0-9]{1,2}\s[A-Z0-9]{3}

Great, it works and matches the format I want and doesn’t match the format I want to exclude. However, when I put that in the RegEx match formula in the string manipulation node, all rows are false. Does Knime use a different kind of RegEx or am I missing something? I’m not too well-versed in RegEx, so it might be me! It’s just strange that it works on the testing sites but not in practice. Thank you!

Hello @JWebb,

your expression works for me on second example (ZZ99 9ZZ) while your first example (Z99 9ZZ) only has one letter and in expression you expect 2 letters at beginning (^[A-Z]{2}...). So maybe following expressions will do the trick for you (not a regex expert):
regexMatcher( $yourColumnName$, "^[A-Z]{1,2}[0-9]{2}\\s[A-Z0-9]{3}")

Br,
Ivan

4 Likes

Thanks Ivan. That has worked for some! Others are still coming out as False which I would expect ot be true. I tried replacing it with “^[A-Z]{2}.+” and that seems to do the trick. I therefore assume that the person who gave this to me left trailing spaces or something in most of them!

(I also realise I provided the wrong example in the first post; I meant to say ZZ99 or ZZ9, hence the {2}!)

1 Like

Hello @JWebb,

you are welcome. So it works now for all records as expected? If not send some more examples and we’ll hopefully figure it out :slight_smile:

Br,
Ivan

Yes, it seems to be. I haven’t figured out why it is not working for the first formula but is for the .+ version, but at least there are some pointers here. Maybe they will also help someone else with similar issues. Thanks again!

1 Like

ok. I get it now. Glad you found a way.
Ivan

Hi @JWebb,

If you are doing regex for UK postcodes, then whilst XX99 9XX and X99 9XX are the major formats, you may also possibly need to handle a few other more unusual ones:

e.g. X9X 9XX (example, the BBC Broadcasting House - where I used to work :wink: - in London - has a postcode of W1A 1AA )

The full set of UK postcode formats is
XX9X 9XX
X9X 9XX
X9 9XX
X99 9XX
XX9 9XX
XX99 9XX

The following String Manipulation regex should :thinking: find all these format, and allow for whitespace at the beginning, end or middle (though it doesn’t allow whitespace within either of the two component sections of the postcode)

regexMatcher($yourColumnName$,"^\\s*[A-Za-z]{1,2}(?:[0-9][A-Za-z]{0,1}|[0-9]{2})\\s+[0-9][A-Za-z]{2}\\s*$")

This requires the format to conform to the following:
^ start of line
\\s* any amount of whitespace (including none)
[A-Za-z]{1,2} one or two letters (don’t care about case)
(?: then from within the following group, either…
option 1 → [0-9][A-Za-z]{0,1} A single digit followed by zero or one letter, ignoring case
| or…
option 2 → [0-9]{2} two digits
) end of group
\\s+ one or more whitespace characters
[0-9][A-Za-z]{2} a single digit followed by two letters, any case
\\s* any amount of whitespace (including none)
$ end of line

The above regex is written to be case-insensitive but you might wish to change the letter ranges if that doesn’t suit.

4 Likes

That is what I was looking for, but I hope the people I am dealing with don’t live at the BBC headquarters! However, thanks for the regex; I will save this and use it in future because who knows if there is going to be a funny postcode in my area? I don’t think so, but it is good to have something which covers everything.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.