PDF and Tika Parser

@cscheeser

Now I understand: when text appears within in a box on 2 different lines like “Incremental” it actually gets assigned a new row and that new row’s column is incorrect. This seems to be an issue with parsing a table which is notoriously difficult. See the many discussions I’ve had with people about reading from tables:

TLDR; Not even state of the art models can read from tables with 100% accuracy without specific training, so this is one area where manual effort, clever strategies, or advanced models need to be used.

1 Like