Splitting column into numbers and strings

Hello there!

Lets say I have a column with a list of numbers attached to units and I would like to separate the numbers from the units so that I will have 2 new columns. Is there any simple way to achieve that?

There are of course different lengths of numbers and different units (which may also contain numbers). So basically what I'm trying is to split the column the first time a letter shows up.

Help is much appreciated!

Example:

column_old column_number column_unit
3250mg/kg 3250 mg/kg
>300mg/kg >300 mg/kg
180ppm/6H 180 ppm/6H
15gm/m3 15 gm/m3
1gm/m3/2H 1 gm/m3/3H

 

Hi!

Try the Regex split node, set to case insensitive and filled with the following RegEx:

(.*[0-9])([a-z].*)

HTH,
E

Thanks that helped me a lot already!!!

But sometimes the unit itself contains numbers and then it doesn't work. For example these will give me:

column_old column_number column_unit
180ppm/6H 180ppm/6 H
1gm/m3/2H 1gm/m3/2 H

I guess the issue is in this part of the expression:  '(.*[0-9])'. Is there a way to split after the first time a letter appears?

Hey,

I'm not too deep into Regex, so the best I'd be able to do off the top of my head is

(.*[0-9])([a-z].+[0-9])([a-z].*)

and then combine the final 2 groups together again...

Though stangely enough, if you untick "ignore case" the orginal Regex will work! Weird... So in case-sensitive setting, you could also always use

(.*[0-9])([A-Za-z].*)

Seems like some weird type of bug, though! KNIMErs, please? :)

Bug-free alternative: go and ask on StackOverflow with #Java and #Regex tags, this usually solves such challenges in minutes! :-)

Cheers
E

Bug-free alternative: go and ask on StackOverflow with #Java and #Regex tags, this usually solves such challenges in minutes! :-)

Better yet: search stackoverflow for existing solutions: http://stackoverflow.com/questions/29434666/how-to-parse-and-capture-any-measurement-unit/29434667#29434667 (this one is for JavaScript, but I guess it should work with Java too).

(I think KNIME do not change the regular expressions in any way, but I can be wrong.)

Gabor,

Agreed, searching is usually preferable! :)

I sometimes get confused with Regex dialects, though - Perl regex  mildly different from Java regex, etc. Mostly due to language-specific reserved characters. Hence the recommendation to aks specifically for Java Regex.

-E

Ergonomist,

You're right, it does work when I untick the "ignore case".  For now this solution is sufficient for me since I don't understand too much about regex. Thank you very much!

Edit: I actually toyed a bit with the following website to try to understand regex a bit better:

https://regex101.com/r/M8LXnI/1

There I tried the regex expression as case sensitive and non-case sensitive and I happen to have the same issue when I choose non-case-sensitive. I used >180ppm/6H as an example. If I understand this correctly then the reason might be that (.*[0-9]) allows letters before numbers and ([A-Za-z].*) allows also a single capitalized letter and therefore >180ppm/6H gets split into >180ppm/6 and H. So I don't think it's a KNIME bug. However, thank you very much again!

"Mtest",

You are absolutely right in your interpretation - .* allows any characters before the numeral (including other numerals!), whereas [A-Za-z] is matching a single alphabetic character, either upper or lower case, but not '>'

Steve

Oh, indeed that's the case, of course! #-) Thanks for digging this up, good thing you got to the bottom of it!

-E