How to remove zeros in a specific spot

Einayyar · September 28, 2021, 3:30pm

Good morning,

I need to remove the zeros on the data set below but its the 0’s in-between the initials and the numbers. The # of 0’s can fluctuate as long as there are 10 characters total. The 0’s are placed in if the ID initials or number is not long enough to fill in the 10 character slot.

ABOX050000
ABOX050026
ABOX050027
ABOX050103
ABOX050105
ABOX050877
ABOX051121
ABOX051833
ABOX051835
ABOX051867
ABOX051867
ABOX051880

Here is an example with no 0’s

BNSF474696
BNSF474696
BNSF474700
BNSF474700
BNSF474700
BNSF474700

Daniel_Weikert · September 28, 2021, 4:42pm

have you already tried the string manipulation node (replace or regex? )
Do you also want to fill the 0s up to 10?

A pattern for regexreplace could look like

regexReplace($column1$,"(.?)(0+?)(.)" ,"$1$3" )

and if you like to “pad” them up to 10 then try the pad function.
Hope that gets you started
br

Einayyar · September 28, 2021, 4:58pm

Hey Daniel,

First of all, thank you so much for replying. Since I am new to Knime can you explain what the regex is doing below? It looks like the formula you have is removing the zeros within the numbers such as BNSF474700 (original) with (BNSF4747). In essence, I need the formula to skip the example I have here as there are no zeros between BNSF and the numbers.

I am more than willing to manipulate the formula you have, but honestly have no clue as to where to start.

Thanks
Shaun

Daniel_Weikert · September 28, 2021, 5:11pm

The formula above removes 0s in between ABOX050000 => ABOX50000
you might want to give it a go on your data and modify accordingly
br

Einayyar · September 28, 2021, 5:15pm

That works perfectly! Thank you so much.

Do you mind explaining further what the formula is doing here so I can use it in the future? Just trying to learn.

Daniel_Weikert · September 29, 2021, 4:25pm

Hi @Einayyar
if it works for your problem then please mark it as the solution for your post so that others can find it.
The “()” are groups in regular expressions which capture data. The dot “.” stands for any character. The star “*” is a multiplier and the questionmark makes the group “non greedy”.
So basically the first group catches everything up to the first “0” entry. The second group catches the 0s and the 3rd group catches everything after the 0s.
The $1$3 refers to the first and third group.
So basically split the data into 3 groups and return only the first and third group together.
best regards

ipazin · September 30, 2021, 10:05am

Hello @Einayyar,

in general you can use following page to build, test and get explanations for your regex:

Br,
Ivan

system · March 31, 2022, 10:06pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.