PDF and Tika Parser

Hi,
Scientific pdf files are usually composed of 2 or 3 columns in one page. When I use the PDF or Tika parser and I check the text content output afterwards, these columns are combined (attached picture).
Do you know whether would be possible to avoid such a thing and read the columns separately?
Thank you in advance!
Cheers,
Nazareno.

Hey @Nazareno,

unfortunately I don’t see any (easy) possibility to work around this issue right now.
I will create a ticket in order to fix this.

Thank you for reporting.
Cheers,
Julian

2 Likes

Hi Julian,
Thank you for the answer. I hope you can solve it soon.
Best,
Nazareno.