Hi, First of all, this is a great community! I am looking for help to read a .pdf document (such as Annual reports). I tried Pdf parser and Tika parser and then did a ‘Bag of Words Creator’ to get the total list of words (using which I am required do some further analysis) . But both instances gave different number of total words, which is different from the actual figure. I tried converting the pdf to Word and then used the Word parser. That gave me another different number. I am not so well versed with all this, am I missing something here? Please help. [image] [image] [image] Thank you!

Reading PDF files

mlauber71 December 10, 2022, 6:49pm 3

@LakshmiK maybe you could take a look at these examples reading words and data from pdf with the help of R packages: