Scientific pdf files are usually composed of 2 or 3 columns in one page. When I use the PDF or Tika parser and I check the text content output afterwards, these columns are combined (attached picture).
Do you know whether would be possible to avoid such a thing and read the columns separately?
Thank you in advance!
unfortunately I don’t see any (easy) possibility to work around this issue right now.
I will create a ticket in order to fix this.
Thank you for reporting.
Thank you for the answer. I hope you can solve it soon.