How to read IndyCar PDF file

bowlinglm · June 27, 2023, 1:26pm

Road America Sonsio Grand Prix indycar-sectionresults-p1.pdf (708.5 KB)
Road America Sonsio Grand Prix indycar-results-p1.pdf (821.3 KB)

Trying to learn KNIME. I thought start with something I enjoy… IndyCar. Trying to read the attached two PDFs. I am learning these KNIME nodes. I have worked with an excel reader node successfully. Struggling with Tika Parser on these PDF files. Trying to get the fast times from second file then lap times from lap times from first file. Thanks for your help!

mlauber71 · June 27, 2023, 8:23pm

@bowlinglm I extracted the tables with the help of the camelot-py package and stored them as .parquet files in a sub-folder.

Then I used KNIME to extract the headers and add some information. There are some tables at the end of the PDF that would have different structures - you might have to deal with them separately. At the moment they will just be skipped.

The workflow needs some polishing and currently the Python is done in a Jupyter notebook in the /data/ subfolder. edit I have included that in the workflow:

For other attempts with extracting tables from PDF there was this challenge:

system · September 25, 2023, 8:24pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.