RegexReplace in String Manipulation. How to handle with multi line and not remove leading zeros.

sgilmour · May 19, 2020, 11:32pm

When using the string Manipulation Node with RegexReplace I run into two issues.
I am still looking for solutions but haven’t found one yet.
Here is my current regex expressions:

The first issue is that it is removing leading 0’s and I would like tit not to remove the leading 0’s since I need it for the file and site no’s
The second issue is when I have a row with multiline of data in my columns it is changing it and creating more rows and messing with my columns by not putting data into the right columns.

Thanks for the help,
Scott

AnotherFraudUser · May 20, 2020, 12:42pm

Hi sgilmour,

not sure but could it be that you are missing the \ before the u0000?
Or do you not indend to not select the characters outside the 0000-06FF unicode range?

Also your regex includes the newline character - so it always will remove them as well?
If you only want to select the unicode range at the end of the line (at least that is what you are currently doing) without removing the newline as well you can change your regex to:
[^\u0000-\u06FF]+$

else if you want to remove just any of these special characters anywhere in your string - then I would change the regex to:
[^\u0000-\u06FF]+

Maybe you should test your string in a regex tester like:

there it is easier to see what is currently matched and why (as well as providing support on what regex syntax is availible I guess)

ScottF · May 20, 2020, 8:27pm

Just a quick suggestion that you might want to try the new RegEx Extractor node in the Palladian extension. @qqilihq provides more detail about it here:

AnotherFraudUser · May 20, 2020, 10:01pm

@ScottF not direcly related to this topic… but do you indend to upgrade the native knime regex nodes in the future?
They seem really really basic compared to @qqilihq nodes, so often it is easier to fall back to Java Snippets with regex

sgilmour · May 21, 2020, 12:28am

Hi Scott,
So can I just replace string manipulation with regex extractor in my workflow?
Thanks
Scott

ScottF · May 21, 2020, 2:40pm

I’ll have to ask around internally to see what our future plans are for the RegEx nodes - I haven’t heard anything recently. Certainly the recent Palladian nodes are quite nice!

AnotherFraudUser · May 21, 2020, 5:04pm

@ScottF does not sound reasuring, but thanks for the response

AnotherFraudUser · May 21, 2020, 5:07pm

@sgilmour I think it depends on what you indend to do.
As it seems like you want to remove certain special unicode characters - I would think that you cannot use the extract node for that.
I think you can however use the Palladian Nodes to check your regex and see what you actually want change.

(however I still think my regex above should come near if I understood you correctly)
I think you c

sgilmour · May 21, 2020, 5:35pm

Ok thanks
I will check out the Palladian node. I am just downloading them now. Sounds like this will do what I want to verify my regex expression

system · November 20, 2020, 5:37am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.