Importing large directory of MS Word and .pdf files to KNIME

I’m looking for help building a workflow that would start with importing text from a large directory of MS Word and .pdf files. This data would be used for topic modelling, tagging, and deep learning. At a minimum I’d like to get this data into KNIME, but any direction on the text processing and approaches would be appreciated.

Look at the


and other nodes around.
4 Likes

Have you searched on the Hub for example workflows? There are several. In particular you might try:

Topic Modeling: https://kni.me/w/H8EUf75lnsyAv6-U
Document Tagging: https://kni.me/w/IBp9LRLyNKA9r0H6
Sentiment Analysis w/ Deep Learning: https://kni.me/w/NHJpmqsAJ3Ib-thH