I am trying to load both Excel and PDF files to combine into one database and then export back to Excel. The issue is that the files contain multiple tabs but I only need to read one tab and need this from 200+ individual files and they are unstructured. Meaning the data needed is on just one tab within the file and that tab does not have the same name within each file. Also I need to extract certain data from that tab to create one database because the Excel and PDF files are setup like form versus a normal Excel table and does clearly defined columns and rows. I am new to using Knime so still trying to understand the Platform.
Welcome to the forum. This sounds like it might be a complicated data ingestion problem, but it’s hard to tell without some sample data. Do you perhaps have any Excel or PDF files you could share, along with an idea of what you would want your output to look like?
My first thought would be that you might have a table that specifies file names and sheet names for each of the Excel files you need to read. You could then use KNIME to loop over that table, read in the files, and combine them into a single table. You may want to use the Excel Reader or Tika Parser nodes for the actual reading of the files.