Regex Split Node Crash

Hi Everybody,

I have been trying several different regexes to group a document with a couple rows containing text with no common pattern for "all" (a few rows have similar pattern but not exactly the same, but the rest does not havw anything common at all).

Finally, I have decided to give a try to "([^0-9]*)*?([0-9]*)*?"

My aim was to establish groups of numbers and strings. However, after processing a couple of rows, Regex Split node does not continue and the progress bar is stuck at some percentage.

I don't know the exact reason of crush: is it because my regex creates an infinite loop or something wrong with my computer?

Any help will be highly appreciated.

Bora

From looking at your syntax, I assume you are trying to get many splits. However the ()'s you are using for capturing groups, I believe, can not be used in the way you expect with quantifiers with *+? etc.

You cannot say you want 0 or more capturing groups for example.

I could be wrong. Be interested to hear thoughts from others.

Simon.

Hi Simon,

I changed the expression a little bit "([^0-9]*)?([0-9]*)?" and it actually worked. However, it requires many repeats like ([^0-9]*)?([0-9]*)?([^0-9]*)?([0-9]*)?([^0-9]*)?([0-9]*)? and so on. After approximately 10 repeated pattern the node crushes again, but up to that point it works very well.

I will try your suggestion. I don't know whether I am searching for the impossible or not. If I can succeed to some extent at least, I have a couple of ideas to work on.

Thanks again my friend. You are the best.

Bora

If you want to keep splitting the string whenever text switches to numeric and vice versa, then the procedure below should work much better for you:

Use a String Replacer node, choose Regular Expression and Replace All Occurrances. For Pattern enter "([\p{L} ])([0-9])" without the quotes. For replacement text enter "$1SPLIT$2" without the quoutes. This will now add the word "SPLIT" inbetween text and numbers.

Now add a second String Replacer node, setup in the same way as described but this time use for pattern "([0-9])([ \p{L}])" without the quotes, and the same replacement text as before. This will now add the word "SPLIT" inbetween numbers and text.

Now add a Cell Splitter node. In the delimiter box add "SPLIT" without quotes. Choose to remove trailing and leading white spaces if you wish.

Hope this is what you've been trying to achieve.

Simon.

Hi Simon,

my regex trial of ([^0-9]*)?([0-9]*)? pattern for grouping numbers and non-numbers separately works if I keep the pattern long enough to cover the whole data in the row.

 

 

That's correct, but it's knowing what length of the pattern is long enough. If you use the method I explain, it will work for any length of pattern of numbers and non numbers.