I am looking for a workflow that does the following:
- Read files (txt) from a folder, number of files and their names vary
- Transform the data (this workflow exists. But it is branched and sequenced)
- Write data from each file into a new sheet of a new excel file.
Which loop components should I use in order to iterate over step 2 for as many times as there are files in the source folder?
Many thanks for your help!
you should use the
Excel Writer node inside the loop. A
Variable Loop End node can be used as a loop termination.
Note: I am using the newer
Excel Writer node which also allows a sheet append.
Note2: In the lower branch a new Excel file is generated for each original file and in the upper branch a new data sheet is to be added to this Excel file. So your workflow generates an Excel file with two datasheets for each original file. Is this correct?
many thanks for your kind help and your suggestion. I am new to Knime and did not know which loop element to take. Now the loop workflow runs and does not abort.
You were mostly right about the functions of the two segments. I want the the lower branch to create sheets with certain names, based on some fields of the input files.
So I thought to hav e this function inside the loop and to execute the lower branch before the upper fills the sheet with data. So far for my plan…
However, the workflow only creates an excel file with a single correctly named sheet containing the data of the first source file. The second file in my folder is not picked.
Maybe the configuration of one of the nodes is not correct?
I wonder if, for example, the file reader is not creating a mess: I gave it my first file because it asks for a path with a file.
I have attached the workflow:
KNIME_project 1.knwf (57.2 KB)
Maybe this helps?
change your lower branch to create only the SheetName. At the end of this branch transfer the SheetName into a flow variable.
Your Table Row to Variable node could look a little different, since I’m using version 4.3. At this point you have your SheetName for this loop.
I dont’t know if you do so, but you have to set your path with your URL flow variable in the File Reader node. Execute the Loop Start node to pickup the first file path and setup your File Reader node. Every node should have the same configuration.
Your upper branch should build the data for the file. Setup your Excel Sheet Appender node with your SheetName from your lower branch.
You need only one Excel Sheet Appender node in your workflow.
I hope this helps.
many thanks for your help!
Unfortunately, this workflow still just picks one file from the folder, I am struggling with defining the URL variable, I guess:
I assume you suggest to move the “Table Row to Variable” node from the start of the workflow to the position shown in the screenshot.
But was this not the start node for the loop? The end node is now throwing an error, asking for a related loop start node.
Sry I’m short on time right now so I cannot point you to the proper reference post etc. Use the csv reader instead of a file reader. This should fix the issue.
Enjoy your weekend
Hi Mark, thanks for your help. I tried with the csv reader, which works in so far that all files from my source folder are written in individual lines of the output table of that node. However, as I want to write the data from the source files 1:1 into excel sheets of an excel file the exchange of the reader node did not help.
I think my above attached workflow has a problem with the loop design. It loops over the source files and collects all the data but it should rather write the first spread sheet, empty any buffer, fetch the next file in order to write the next spread sheet, etc.
So far, I haven’t found a solution for this in any reference post.
Anyhow, many thanks
Maybe I’m not understanding your problem. You want to read several files and each file goes to the same excel file but a different sheet. If that iscorrect move the excel writer into the loop as pointed out by @Andrew_Steel. Control the sheet name via a string variable and select the append option.
If it has to go to the same sheet but a different excel file create a path flow variable, e.g., create file folder variable or via string to path (flow variable).
In both cases the writer has to be inside the loop.
Hope that solves the problem
I would consider the excel writer and the excel sheet appender nodes are already inside the loop which I thought should start with a “Row to Variable Loop Start” node and end with a “Variable Loop End” node. I selected the “Variable Loop End” node as the excel writer nodes have no outgoing data connector, thus not allowing to use the “Loop End” node.
Many thanks and kind regards
Quick question what version are you using. I guess my answers only apply to >4.0 as the sheet appended does not exist anymore
I’m working with 4.3.2. I know: the sheet appender is deprecated… Could this be the root problem?
because there are no suitable csv-files in your workflow I can’t recap your workflow. From a 5000 feet viewpoint of your workflow
there are two important nodes: the Table Row to Variable to get your sheetName into a flow variable and the Excel Sheet Appender to write all data into different sheets.
In my opinion it is not important which Excel Writer Node you use, both, the Excel Writer node and the Excel Sheet Appender creates a new File, and append new sheets.
Hi Andrew, thanks for your message. I followed your advise (and have the workflow attached again LoopFiles2ExcelSheets.knwf (106.1 KB) ).
However, it is still not working and the problem might still be the same. Node 60 in the upper path breaks after the first file. I think because it works by row numbers. Apparently, the loop adds the rows from the next file which then cause an error. That’s why I believe the loop is not completely working file by file.
The workflow inside the loop construction works fine.
I probably have to live with this and process the files one by one. It’s only annoying that I not getting this loop thing sorted…
there are two different row formats in your csv file, so the Column Expressions node crashed.
I have changed your workflow a little bit so it worked with this different row formats
and now it ended with one xlsx-file and different sheets. It should work also if you filter out the different second line.
The csv files I’ve used are in the data directory of the workflow and the solution xlsx file too.
LoopFiles2ExcelSheets.knwf (125.0 KB)
Hi Andrew, many thanks: now this is working smoothly.
I did not know the “ungroup” node. This streamlines the workflow considerably and replaces the node that originally caused the problems.
Thanks again and best regards
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.