Is there a way to configure the PDF Parser node to maintain the original column structure and extract the text as it appears in the source PDF?
Well, from the node description:
The full text of the PDF is extracted, the structure of the PDF is not taken into account.
The Tika Parser is no different, so there isn’t a good way to do this with built in nodes. I’ve seen R tabulizer package mentioned as an alternative, but I have no experience with it.
Other people have had the same question, and there’s an existing ticket for this (AP-14318) but as of March there was no movement on it.