PYTHON PDF Module

HI @mlauber71
I m back on your workflow
PDF - Python package Tabula, PdfPlumber and Camelot to extract Text and Tables - KNIME Forum (82131).knwf (759.3 KB)
I have always the same message for the module.
I dont understand where are the modules and where to apply them.
Regards
Br

@Brain not sure I follow. What you will have to do is:

  • install Python on you machine (best to use Miniforge)
  • create a conda environment with the necessary packages using the py3_knime_pdf.yml file which contains the necessary configurations
  • install the Python script extensions for KNIME
  • tell KNIME where to find conda

If you have conda installed and told KKNIME where it is the point about creating the environment with the packages can be done using the metanode conda_python_pdf – KNIME Community Hub

My recommendation would be to read this article - it will help you on the way I hope:

Then you are ready to explore the use of the nodes in the workflow. You will have to do some trial and error to get to the results.

Another approach is to use LLMs to extract information from such a statement. This is an example with other data. In this case you will have to work on the statements and especially tell the LLM to output clean CSV or JSON files. Even Apple’s engineers seem to struggle with that so no shame in having to try a few times.

2 Likes

Hi,
Thanks for your quick answer.
I actually get that :



Where problem come from ?
Thanks
Br

@Brain you can either use the standard Python environment you have configured in the preferences. And if that if the one having the PDF libraries you will have to tell the Python Script node that. Or you can use the Conda Environment Propagation (1) and tell the Python Script node to specifically use that (2, 3) - as explained in the article.

I made this chart to sum up the alternatives:

The Metanode I built will provide a ready made environment for the PDF task and tailor it to either macOS or Windows depending on the operating system detected

Since sometimes the detailed Python packages will slightly differ between them and the propagation node will remember specific packages and not just the overall names.

I have exactly the same parameters




And i get always the same message error : no module found …
Is there a solution to check these modules are in ?
Thanks
Br