Excel Sheet Appender - skip if sheet already exists

Krau5i · September 18, 2019, 6:48pm

Hello,

I was wondering if it is possible to use the Excel Sheet Appender and skip if a sheet already exists, not abort the entire workflow. I’m iterating through a number of files and would like to keep up performance so only the new files are added as a new sheet to m output file.

Any ideas/ recommendation are appreciated.

Thanks,
Tobias

s.roughley · September 18, 2019, 9:39pm

Use the list sheet names node at start of each loop iteration, followed by a row filter to only keep the sheet name you want to test for, and then use an empty table switch / End if combination to process the file of the sheet name did not exist.

Steve

Krau5i · September 20, 2019, 9:52pm

Hi Steve,
thanks for your response. I can follow your train of thought until the empty table switch and don’t fully understand how to implement this into my existing workflow.

This is my porcess so far:

Thanks,
Tobias

s.roughley · September 21, 2019, 2:52pm

OK, so what I think you need to do is as follows…

Connect the output of your loop start node to the input of the Read Excel Sheet Names (Node 49 in your picture)
This is where it gets a bit tricky… because you have your three Excel Reader nodes you need do activate/deactivate the flow variable input to those, which you cant do directly from the Empty table switch - you have two choices…
a. Follow them empty table switch with an ‘Extract Variables (Data)’ node on each output, and then take the top output as the input to all your Excel Reader nodes. Collect up the active branches with a Flow Variable End If node (from the Vernalis Community Contribution). I’ve shown a simplified form here - substitute your three Excel Readers and the rest of the loop body for my Excel Reader/Excel Sheet Appender combo:

b. Alternatively, replace the empty table switch with an Extract Table Dimensions/Flow Variable If Switch (Flow Variable Value) pair (the latter is again from the Vernalis community contribution):

and configure the if switch to switch based on the number of rows:

Finally, in both cases, I have replaced your Loop End node with a Variable Loop End - the reason for this is that as far as I can see, you dont need the output from the loop end, and also you need to connect the hidden output flow variable port from the excel sheet appender to a node in the loop body (or to the loop end) to ensure it runs in each loop iteration - using the variable loop end does both.

Steve

Krau5i · September 23, 2019, 4:31pm

Hi Steven,

thanks again for your detailed instructions. I built the workflow a) above and while it does not return any errors, I feel like it appends whether it exists or not. Maybe that is due to a fact that I had not mentioned yet.

In a Table Row to Variable note (highlighted below) I set the tab name based on a variable from the input file. This tab name should be matched to the Rule-based Row Filter (highlighted below) which returns a list of the already existing tabs that should not be executed. This match is still missing in my opinion. Is there a note that can compare a list with a variable value?

Edit: And then another node would have to tell my table readers to execute only this file. It seems like this is getting quite complex. Unfortunately, I cannot justify the wait time of my current process. It just takes way to long to append all tables with every execution…

Thanks again,
Tobias

s.roughley · September 24, 2019, 8:35pm

Sorry, without looking at your data I really cant think how to solve the rest of the problem - maybe someone else on the forum can supply the missing final piece.

You mention the time it takes to append the tables, but how does this compare with what you have in place at the moment?

Steve

Krau5i · September 25, 2019, 5:01pm

I totally understand. I’ve tried to reoganize my project to and made the question more precise. Maybe this could help with the understanding of the issue?

I’m looking for a node switch to check if “2” exists in the top output table. If yes, trigger next iteration. If no, continue with workflow and pass variable from “Table Row to Variable Loop Start” on to the next step.

Thanks,
Tobias

armingrudd · September 26, 2019, 5:42am

Hi @Krau5i and welcome to the KNIME community forum,

If I have understand your issue correctly, you have a list of sheets to add and a list of sheets which are already added. Now you want to add the sheets which are not already added. Right?

Why don’t you use Reference Row Filter to exclude the existing sheets from the list?

Krau5i · September 30, 2019, 10:18pm

Thank you for your response. I think the Reference Row Filter helps! I’m missing however the next piece in the puzzle, meaning a node following the reference row filter that checks a condition:

if table is empty, pass previously created variables on to excel reader
if table is not empty, loop through next iteration

Any ideas how to bring these variables back into this flow and where do I need to end the variable loop?

mlauber71 · October 1, 2019, 4:13am

You could try the

https://hub.knime.com/knime/space/Examples/06_Control_Structures/05_Switches/04_Switch_on_Empty_Tables

Krau5i · October 15, 2019, 8:06pm

I think this node could work, but how I can hook it up to my workflow?
The condition is: If filtered table from Node 54 is

not empty, loop through next iteration (variable loop!)
empty, pass variable on to Excel Reader and proceed with workflow, then proceed with next iteration

s.roughley · October 16, 2019, 6:26am

If you connect if second output to an Extract Variables node

https://kni.me/n/_kRD7C88GOx0cUBZ

Then use that too connect to your reader. Finish the two branches with an End If node
Steve

Krau5i · October 16, 2019, 3:54pm

Thank you! I think you guys got me really close to a solution of my workflow. One final issue that remains is a Loop End node error (Node 66):

ERROR Loop End 2:66 Execute failed: Input table’s structure differs from reference (first iteration) table: different column counts 29 vs. 2

The Empty Table Switch (Node 60) if not empty returns 2 columns from Node 54, but if empty runs a whole workflow that creates a table with 29 columns. Is there a neat KNIME solution to this?

ScottF · October 16, 2019, 5:47pm

Hi @Krau5i -

Have you tried checking the Allow changing table specifications box in the Loop End node?

Krau5i · October 16, 2019, 6:02pm

Perfect! This was the missing piece.

Thank you @s.roughley @armingrudd @mlauber71 and @ScottF for your support throughout the workflow.

Tobias

s.roughley · October 16, 2019, 9:37pm

No problem - glad you got there!

I think we all gave you hints rather than a full answer - from which you probably learnt more. All of us on here have been in the same place when we started out with KNIME - it gets easier as you figure ways to do things. (Actually, you will probably realise that different people will give you different answers sometimes to achieve the same thing!) And for me at least, one of the great things about KNIME is that you can ‘just fiddle’ about with your data until you get it how you want it.

Best,

Steve

system · October 23, 2019, 9:45pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.