How to create a workflow that appends a new column to the csv files based on the values from excel file

Hello KNIME community,

I need your support. I’m trying to create a workflow where I can change existing CSV files based on the certain Excel file.
More clearly, I have an excel file which contains the column with the error massages and with the column corresponding file names and the rows where error occurred. According to this Excel file I need to change those CSV files by adding additional column with corresponding error massages. Respectively, the rows which have not displayed error masseges need to be removed.

The question is which tools, in which order should be applied to fulfill above given conditions.

Any tips that you can provide will be appriciated.
Thanks beforehand.

Hi @abekchanov and welcome to the forum.

I have tried to come up with a solution to your problem which you can find here:
49062-excel-and-csv – KNIME Hub

What this workflow does is basically these steps:

  1. Read the csv file
  2. Rename the column header of the table from the csv file. Note: This might be an optional step for you, but I have quickly created a csv file in Excel and exported it and Excel’s csv writing is pretty crappy. If you don’t need it, delete that node and connect the CSV Reader Node directly to the Joiner Node (see step 4 below)
  3. Import the Excel file
  4. Bring both tables together based on the column RowID that you can find in both files. Keep only the error messages from the Excel file (can be adjusted in the tab Column Selection of the Joiner Node)
  5. Filter out the rows that show “Missing” (or the red questions mark) in the column “Error Message” with a Row Filter Node → these should be the lines from the csv that had no error message
  6. Write it back to a csv → I chose to write to a new csv file, to keep the example intact but of course you can also overwrite your existing file. Just change the settings in the CSV Writer Node.

Let me know, if this helped you.

3 Likes

Hi @kowisoft, thank you so much for your help. Indeed, I’ve got the right direction. A little more explanation regarding my task: I regularly receive an excel file that declares errors in the process of reading the csv files in the system. The excel file contains several columns in one of the columns given the error messages and there is a corresponding column that shows from which row of the csv file the error occurred. Furthermore, most of the columns contain similar data. Till now, I have been doing those invalid line reports manually. Now, I want to automatize this process by creating a permanent workflow in KNIME. My other question is how to apply these steps for all csv files at once to get ready reports?
Thanks a lot.

1 Like

Thank you for the kind words, glad that it worked so far.

Reading your response this sounds to me like want to do a loop through a list of files and - eventually - concatenate them together. Looping is a little bit of an advanced concept but not difficult to learn.

In your case I would assume you read a folder that holds all these csv files and then loop through the process one file at a time. So the Chunk Loop Node and the List Files/Folders Node may be your friend here.

If you could share some dummy files we could play around with, I guess we could probably produce something that is close to what you want.

3 Likes

Microsoft Excel-Arbeitsblatt (neu).xlsx (9.1 KB)
file_1.xlsx (8.8 KB)
file_2.xlsx (8.8 KB)
file_3.xlsx (8.8 KB)

Hi @kowisoft , here, I’ve created sample files.

The outcome files
error_report_file_1.xlsx (9.9 KB)
error_report_file_2.xlsx (9.9 KB)
error_report_file_3.xlsx (9.9 KB)
should look like these:

Thank you for sharing these.

One thing I do not understand is why you need to read the source files at all (file1, file2 etc)? Any information you have in the source file is already in the master error report (which I assume is the file called “Microsoft Excel-Arbeitsblatt (neu).xlsx” (by the way, fellow German here :wink: )

Do you just want to split this very master error report by file?

1 Like

Yes Sir, exactly, the results should look like in error_report_files. I’ve generated them just for learning purposes. Thanks (Es freut mich Ihnen kennen zu lernen und vielen Dank für Ihre Unterstützung :blush:)

Hi @kowisoft, thank you for your tips. Here, I’ve tried to create a Workflow for myself. Overall, it worked till writing the processed file into the csv. Could you please look at my nodes? I’ve got an issue with rewriting and naming the concatenated data to csv file type.
Sample_WF 1.knwf (42.5 KB)

1 Like

Hi @abekchanov

Sorry, I totally forgot about our conversation.

For some reason the output file was overwritten by a flow variable. I just deleted it and wrote a results file to the workflow data folder in a subfolder called “result - output

This is working right now. Have a look, if that is what you wanted, otherwise you could also dynamically create filepath flow variable which e. g. could hold today’s date (typical use case).

See this slightly adjusted workflow

Let me know, if that helped.

2 Likes

Hi @kowisoft, thank you so much and sorry for bothering you. Yeah, it worked, but the results are writen in one csv file. Now, I have other questions:

  1. How to seperate the results into three csv files and change their name corresponding to their original name (e.g. error_report_file_1, error_report_file_2, error_report_file_3)

  2. If I want to iterate those files but from differrant file folders, is the same way will be used? I mean the List Files/Folders node, or in this case any other node should be applied?

Thank you in advance. I appreciate your help. I learned a lot from you :blush:

1 Like

Probably you need to do some string manipulation and turning strings into a filepath to achieve this.

Plus some looping, assuming you have more than one result file.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.