Extract Data from Invoices to XML or CSV?

@jannikw99 the prompt with “mxbai-embed-large” embedding, Chroma Vector Store and “llama3:instruct” LLM would give you something like this:

The next step would be to bring this into a loop that would iterate over all PDFs and then bring the results back to KNIME and extract the meta-information and the JSON files from the data. I will see when I will continue to explore this.

conversation_history.json.zip (1.4 KB)

Extract exactly these information from this document into a JSON file.

Do not add any information that is not there. Do not change the structure of the JSON file. Do not alter the names of the fields or the order!

If information is missing just leave it empty! The JSON fields:

Firma (Company) Depot (Depot) Bank (Bank) Transaktionsart (Transaction Type - Purchase / Sale) ISIN / Kenn-Nr. der Aktie (ISIN / Stock ID Number) WKN (WKN) Wertstellung (Valuta) (Value Date) Belegdatum (Document Date) WSL (Währung) (Currency) FW-Kurs (Devisenkurs) (Foreign Exchange Rate) Menge (Quantity) Kurs (Price) WSL Kurs (Währung) (Currency of Price) Kurswert (Price Value) WSL Kurswert (Währung) (Currency of Price Value)

2 Likes