Search term within a cluster of documents


Using KNIME, is there any “straightforward” way of searching a specific text term within a group of documents stored in a folder? How could I?

The number of documents in the folder may vary each time. Also the module should be able to search in documents with different format: pdf, docx, pptx, etc.

Many thanks in advance,

Hi @javiles -

Welcome to the forum!

You could try using the Tika Parser node to read in data from various types of files in a folder (or even recursively in subfolders). Then, once your data is read in, you can use a Rule Engine node with the LIKE keyword to determine if you have a match or not.

For example, if I wanted to search for the word “KNIME” in the content of files output from the Tika Parser, I might do this:


Many thanks @ScottF!

It works perfectly. This Tika Parser is marvelous! slight_smile:


1 Like

good job @ScottF !

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.