I want to remove diacritics on document level (before tagging). Unfortunately removing diacritics is part of string manipulation and so it cant be used for documents. Perhaps the replacer node might be helpful but then I need regular expressions.
Besides the punctuation erasure node removes only the well known punctuations but no strange characters after converting documents before I load them to Knime. Is a solution for this available?
Additional question: After preprocessing documents I want to read the results. Unfortunately if I try to write the table with the document to disk only an extract of the first part can be seen. The same when I convert the document to string. How can I read the processed documents?