Image PDF to text

PBJ · September 30, 2021, 12:35pm

I’m sorry I need to remove the original message because privacy of pictures.

The Tika parser doesn’t parse PDF files with embedded Tif format pictures.
Instead, today I’m using a node (external tool) to execute a open source and free converter (from poppler distribution) to convert each page of PDF files to a PNG picture (External Tool (Labs)). After, all the pictrures are converted to text with TESS4J node (OCR).

The Regex MetaNode goal is only to extract some data from the generated text files (by the Tess4J) and is not related to the original question.

Best regards.