Hi guys,
referring to https://tech.knime.org/forum/knime-textprocessing/punctuation-erasure, I have encountered an interesting circumstance.
Little backstory: I'm not satisfied with the 'Punctuation Erasure'. E.g.: having a combination like "disk-swapping", once the 'Punctuation Erasure' was applied, it looks like "diskswapping" what would make this string useless for certain occasions. Or something like "connection-180-800-200" which would lead to have a string "connection180800200". In this occasion, the 'Number Filter' won't work, and so on. So, a whitespace would be needed instead of just removing the characters.
The basic solution (Kilian came up with in above posted thread) is to use the 'Replacer' node with either using the Regular Expression [!#$%&'\"*+,.\?:;]+ or even better (since most character are covered) "[!#$%&'\"()*+,./\\:;<=>?@^_`{|}~\\[\\]]+"
Interestingly enough, only the first expression works when using with the 'Replacer' node. In case I use the longer Expression, no punctuation character will be removed at all. Same happens if I combine, lets say, only < and >. That means, for every single punctuation character to removce (except for [!#$%&'\"*+,.\?:;]+ ), I have to create an extra node, which I find weird.
Am I doing something wrong or is it due to a bug?
Also, a "use whitespace instead of a void" option to check for the 'Punctuation Erasure' node would be nice.
Thanks,
Manu