Extract Data from Invoices to XML or CSV?

@jannikw99 you could take a look at this discussion and try some examples mentioned there. One option could be to try to invoke LLMs with specific instruction to extract certain information. You will have to maybe feed a single PDF into a vector store and gibe strong instructions to only use a specific JSON structure to output the information and see which model will follow best (maybe mistral or llama3 in the instruct versions).

And indeed maybe a set of (anonymised) sample PDFs that would show the range of your challenge might help.

1 Like