I have a set of documents created using the KNIME text processing plugin, and I want to remove multiple hashtags from the documents. Note that the multiple hashtags might appear in the same document, and I want to remove all of them using the "RegEx Filter" node.
I asked the question on StackOverflow, but the answer that seemed to work, does not remove all the hashtags from the KNIME document.
Any help on how this might be implemented in KNIME using the "RegEx Filter" node?
you can use the Wildcard Tagger node to tag terms based on specified regular expressions. This node is much more flexible then the RegEx Filter node. Then use e.g. a General Tag Filter node to filter or keep the tagged terms.
Attached you find an example workflow how to find terms starting with # and filter them afterwards.