I want to compare two excel file columns,
If the column names in both files are same then I need to import the data from first file and next proceed further transformations on the data.
Are there just 2 files to compare, then if they don’t match the workflow stops? Or could there be a large number of excel files in the folder which need to be tested and matched to each other?
I think that if I were testing large files or a large number of files, the basic approach I would probably take is to read in the files with the excel reader but set it to only read in the headers on all columns (perhaps a few clean up nodes if you need to make sure that the headers are on the same row as usual), then use table difference finder to test for differences, then a case switch to trigger the rest of the workflow if there is a match. If there is a match, then the workflow would read in the full excel files and handle ETL. This could be performed in a loop if you have a large number of files to test.
If you are just testing 2 small excel files, then it is easier to just read the full data of both files, use the column name extractor to get just the headers + table difference finder + case switch, then continue the downstream processing when the match condition is met.
I am thinking that the best way is probably to read in the header columns of all of the excel files in your target folder using the excel reader, along with the excel reader option of appending the file name / location as a new column. That should concatenate them into a table for you, which will allow you to match them as rows on a single table in 1 easily reviewable step (perhaps the duplicate row finder ignoring file info columns), and provide the matched files / locations to pass to the rest of the workflow. I am on a time crunch work project for the next few days, but might be able to mock up a workflow later in the week if no one can get to it first.
You should be able to read in the headers of all files with just the excel reader if you set the column names by column position as numbers or names instead of targeting a row that contains the column names to be used as KNIME headers.
Also make sure not to skip empty or hidden columns if you are trying to exactly match the structures. Also uncheck “Fail if schemas differ between multiple files” to help avoid issues and see the output as you dialog the settings.