Is there a node that allows to extract text that matches a regular expression? For the beginning I would liketo extract email addresses from html pages with the regular expression:
\b[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}\b
What type of document is expected, if I get this error message?
---------------------------
Dialog cannot be opened
---------------------------
The dialog cannot be opened for the following reason:
No column in spec compatible to "DocumentValue".
---------------------------
OK
---------------------------
The node for your problem is RegExSplit. In this forum there is already a solution that can be adapted to your algorithm.
Maybe you have to use some nodes from the textprocessing extension. Then your reported error message like "No column in spec compatible to "DocumentValue"" makes sense. The node needs at least one column of type Document, not String. The textprocessing extension includes nodes that can convert from string to document and vice versa.
I have attached a very simple example that extracts email addresses. Please try if this this for your use cases or feel free to adapt this simple example.