Multiple Unstructured Files

Hi,

I am trying to load both Excel and PDF files to combine into one database and then export back to Excel. The issue is that the files contain multiple tabs but I only need to read one tab and need this from 200+ individual files and they are unstructured. Meaning the data needed is on just one tab within the file and that tab does not have the same name within each file. Also I need to extract certain data from that tab to create one database because the Excel and PDF files are setup like form versus a normal Excel table and does clearly defined columns and rows. I am new to using Knime so still trying to understand the Platform.

Your help is greatly appreciated.

Hi @KevinD1

Welcome to the forum. This sounds like it might be a complicated data ingestion problem, but it’s hard to tell without some sample data. Do you perhaps have any Excel or PDF files you could share, along with an idea of what you would want your output to look like?

My first thought would be that you might have a table that specifies file names and sheet names for each of the Excel files you need to read. You could then use KNIME to loop over that table, read in the files, and combine them into a single table. You may want to use the Excel Reader or Tika Parser nodes for the actual reading of the files.

ScottF,

Thank you for the response. This task was to complicated for a start to Knime but I appreciate the feedback. The files contained to many variables.

Thanks.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.