I’m looking for help building a workflow that would start with importing text from a large directory of MS Word and .pdf files. This data would be used for topic modelling, tagging, and deep learning. At a minimum I’d like to get this data into KNIME, but any direction on the text processing and approaches would be appreciated.
Look at the
and other nodes around.