Hello, I am trying to build a simple PDF parsing workflow that would input read multiple PDFs and score them based on the occurrence of certain keywords (which can be manually entered in a dictionary or table beforehand). The output would be a table with the name of each file, and its score.
It sounds conceptually simple, but I’m a bit lost as to which nodes to use as a beginner.
Thanks in advance
Thanks, this is useful, especially the Tika Parser. However, how do I actually score my files based on the occurrence of a given word list? The provided example workflow seems to identify most frequent words, which is not exactly my use case.
Thanks