Parsing PDFs and searching for specific keywords

Hello, I am trying to build a simple PDF parsing workflow that would input read multiple PDFs and score them based on the occurrence of certain keywords (which can be manually entered in a dictionary or table beforehand). The output would be a table with the name of each file, and its score.

It sounds conceptually simple, but I’m a bit lost as to which nodes to use as a beginner.
Thanks in advance

Look through this example

To read PDF use Tika Parser.


Thanks, this is useful, especially the Tika Parser. However, how do I actually score my files based on the occurrence of a given word list? The provided example workflow seems to identify most frequent words, which is not exactly my use case.

Use Rule-base row filter or Joiner to filter specific words. Group by to count them.

1 Like

Thanks, I got it working! The rule based row filter was the way to go.


This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.