I used in in the node Regex Extractor: (.{1,60}[^\s]*)?\s? to split string at 60 char mark. However, it splits it at 66 or 68, depending on the length of the word instead of moving the last word to the splitted group.
Is there a change required to the Regex in order to impose the split at max 60 char?
Need to split a string (eg. Bay PT C.P.R. Station Grounds being PT of locations 8 3C PIC SRO PT 4 & 5 55R2614 Except PT 1 55R3261 ) into multiple; first is max 40 char, next is max 40 char etc. whole words
Hi @IrynaK I’m not entirely certain that this can be done with pure regex. Your requirement is subtly different to the post you referenced.
What the regex pattern that you have used does is match up to the first 60 characters and then also capture everything after that up to but not including the next white space. This ensures it didn’t break mid-word. That’s why you see the results you are getting. It isn’t restricted to 60 characters. It returns 60 or so…
Does this absolutely have to be a regex solution? If it does then maybe an alternative is to give a guess at what the longest word you might encounter is likely to be and subtract that from 60, then use that number in your regex in place of the 60 value. So, say it gives you, say, “50 or so” instead…
Although in your latest comment you are talking about 40, so I’m now slightly confused about the requirement.but gleefully you can see what I’m saying.
Not saying it definitely can’t be done with regex but I can’t think of a way at the moment.
As @takbb already pointed out, you are using a regular expression that splits input on the first whitespace after 40 characters. This leads to string lengths of >= 40.
Just use this expression instead:
(.{1,40})
I also attached an example workflow to my NodePit Space:
This already works out of the box. The example string split by 40 characters looks as follows… Just configure the Regex Extractor to split matches in rows or columns as you prefer.
So I ended up using (.{1,40})[\s.]*\s
it seems to split the string to max 40 or less with full words intact, however I am unable to catch the last part of the string with this regex. Can you help?
What happens if you put brackets around the second part… So you’d capture both. You might need to trim whitespace afterwards depending on your use case
e.g
(.{1,40})([\s.]\s)
Or
(.{1,40})([\s.])\s
Or maybe…
(.{1,40})\s*([\s.]*)
This formula works if I add a space at the end of the string: (.{0,40})[\s]
So I am wondering how can I say that the last group needs to be captured from the end not the whitespace?