How to perform PDF word count (a) per document and (b) for all documents?

Hi,

I’d appreciate help.

For start analyzing PDF documents content, I created a simple workflow with nodes:
PDF Parser > Bag Of Words Creator > Punctuation Erasure

Then I stuck!

What I would like to perform is a “word count” (a) per document and (b) for all documents.

For instance:
Document 1: word1 word1 word2
Document 2: word1 word2

The expected outputs would be:

(a) per document
Document 1: word1 (count 2)
Document 1: word2 (count 1)
Document 2: word1 (count 1)
Document 2: word2 (count 1)

(b) for all documents (I don’t have interest of seeing counting grouped by document).

word1 (count 3)
word2 (count 2)

How to perform PDF word count (a) per document and (b) for all documents?

Thanks for assistance,
Cadu

Hi,

See attached workflow, hope it helps !

Martin K.

PDF parser.knwf (5.8 KB)