Simple PDF Text Extraction

Hub · April 7, 2022, 10:07pm

This is a companion discussion topic for the original entry at https://kni.me/w/kjy6Q-3szxcH6716

PKRISH · October 11, 2022, 5:01am

Hi,

What are the nodes, in a sequence, to be used for distilling data from pdf files, both text and image. I have tried using both the tika parser as well as pdf reader. it does not seem to detect multiple values in different lines on the same columnar cell.

Any help would be of assistance.

Thanks and regards

KRISH