KNIME Automation: Generate Product Descriptions from Excel + PDFs

I want to build a KNIME automation that processes an Excel file row by row and, for each product, searches multiple PDFs for relevant information using the product name or article number. Based on the extracted PDF content, it should automatically generate a product description and write it back into the Excel output. Right now, the matching is unreliable (content doesn’t match the product) and the Excel row order/columns sometimes get lost.

Please help.

@MarcoLebschi welcome to the KNIME forum. I think what you need to do is define the output in the form of maybe a JSON file. And then extract the content from the JSON in a structured way.

Does the Excel file contain the exact paths and names of the PDF files? Also how is the content of the PDF being presented? It seems you do not create a RAG store with the content.

I think you will have to come up with a strategy to find the right PDF files and maybe content.

Maybe you can provide us with an example what you want to do that best represent your challenge.

5 Likes