Search term within a cluster of documents

Hello,

Using KNIME, is there any “straightforward” way of searching a specific text term within a group of documents stored in a folder? How could I?

The number of documents in the folder may vary each time. Also the module should be able to search in documents with different format: pdf, docx, pptx, etc.

Many thanks in advance,
Javi

Hi @javiles -

Welcome to the forum!

You could try using the Tika Parser node to read in data from various types of files in a folder (or even recursively in subfolders). Then, once your data is read in, you can use a Rule Engine node with the LIKE keyword to determine if you have a match or not.

For example, if I wanted to search for the word “KNIME” in the content of files output from the Tika Parser, I might do this:

4 Likes

Many thanks @ScottF!

It works perfectly. This Tika Parser is marvelous! slight_smile:

Cheers,
Javi

1 Like