RegexReplace in String Manipulation. How to handle with multi line and not remove leading zeros.

When using the string Manipulation Node with RegexReplace I run into two issues.
I am still looking for solutions but haven’t found one yet.
Here is my current regex expressions:

The first issue is that it is removing leading 0’s and I would like tit not to remove the leading 0’s since I need it for the file and site no’s
The second issue is when I have a row with multiline of data in my columns it is changing it and creating more rows and messing with my columns by not putting data into the right columns.

Thanks for the help,
Scott

Hi sgilmour,

not sure but could it be that you are missing the \ before the u0000?
Or do you not indend to not select the characters outside the 0000-06FF unicode range?

Also your regex includes the newline character - so it always will remove them as well?
If you only want to select the unicode range at the end of the line (at least that is what you are currently doing) without removing the newline as well you can change your regex to:
[^\u0000-\u06FF]+$

else if you want to remove just any of these special characters anywhere in your string - then I would change the regex to:
[^\u0000-\u06FF]+

Maybe you should test your string in a regex tester like:

there it is easier to see what is currently matched and why (as well as providing support on what regex syntax is availible I guess)

1 Like

Just a quick suggestion that you might want to try the new RegEx Extractor node in the Palladian extension. @qqilihq provides more detail about it here:

3 Likes

@ScottF not direcly related to this topic… but do you indend to upgrade the native knime regex nodes in the future?
They seem really really basic compared to @qqilihq nodes, so often it is easier to fall back to Java Snippets with regex :stuck_out_tongue:

Hi Scott,
So can I just replace string manipulation with regex extractor in my workflow?
Thanks
Scott

I’ll have to ask around internally to see what our future plans are for the RegEx nodes - I haven’t heard anything recently. Certainly the recent Palladian nodes are quite nice!

@ScottF does not sound reasuring, but thanks for the response :slight_smile:

@sgilmour I think it depends on what you indend to do.
As it seems like you want to remove certain special unicode characters - I would think that you cannot use the extract node for that.
I think you can however use the Palladian Nodes to check your regex and see what you actually want change.

(however I still think my regex above should come near if I understood you correctly)
I think you c

1 Like

Ok thanks
I will check out the Palladian node. I am just downloading them now. Sounds like this will do what I want to verify my regex expression

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.