[bug or no bug ?] Excel Reader preconfigured work sheet selection options

This topic hints at a potential buggy behaviour in the new Excel Reader node when it comes to the sheet selection option - new as in KNIME version 4.4 and later.

I have come across a similar issue.

What I observe is that, within a loop, such as the one shown here below, the Excel Reader node sometimes unpredictably forgets or resets to default the preconfigured sheet selection option (i.e. select first sheet with data, select sheet with name and select sheet at index):

In the above workflow, the file path is a controlled by a variable, while all the other options have been preconfigured based on one of the files. All the files share exactly the same sheet names and data structure per sheet type.

In versions prior to KNIME 4.4, this would never be a problem, for any preconfigured parameter would remain set as preconfigured.
With KNIME 4.4 and above, this now looks differently. Let me elaborate my suspicion.

When selecting the Excel Reader configuration screen with the red status light on, I have noticed that there is a quick “scanning” message underneath the file name. During this tiny “scanning” period, the worksheet selection option appears momentarily empty or set to something else. Given the red status light, the file path variable is obviously not populated and thus the scanning is based on the file which is used to initially preconfigure the node. Could it be that this scanning behaviour would cause the node to sometimes reset the worksheet selection option ?

Maybe I simply have to adapt my expectations and my workflow: is it still possible to preconfigure an Excel Reader node for certain options based on a sort-of-template Excel file and to expect those options not to be automatically adapted by the node? or is it safer to provide certain options as a flow variable as well during parametrisation?

Given the experiences drawn from the aforementioned topic, in which a well documented answer was flagged as solution, I am less optimistic with parameter approach because, after all, the original poster admitted no longer being able to trace the reason for the buggy behaviour.

Before you ask me for a reproducible workflow to share here in the forum, please understand that I can unfortunately not provide you with any such workflow. This is because it is part of a much bigger workflow which is itself called by another workflow using the Call Local Workflow (Row Based) node. The said workflow of workflows used to run flawlessly until KNIME 4.4 excluded.

Meanwhile, I will parametrise the sheet selection options and report back here, should this improve the situation. If this fails, I will go move back to the old Excel Reader node.

2 Likes

Hey @Geo,

thanks for reporting! If I understand correctly you are looping over a list of paths (excel files with same structure) and you preconfigured your Excel reader for instance with “Select sheet index” = 1 and sometimes a sheet with index != 1 appears to be in the output?

It should be possible to preconfigure the node with a template file, the node would fail during execution for instance if a file does not have the configured sheet name / index and those settings should not be adapted automatically nor should it be safer to configure it via flow variable when the parameter stays the same.

Attached you’ll find a workflow which tries to reproduce your issue (in case I understood correctly, if not please correct me), unfortunately I was not able to reproduce the issue.

Best regards
Lars

excelSheetProblem.knwf (62.6 KB)

@laaaarsi , I have close issue. When I migrated workspace from 4.3.4 to 4.5.0 the page selection did not been preserve in Excel Reader. Node was reset to read a new file.

2 Likes

Hey @izaychik63,

I just created myself a 4.3.4 workspace with a workflow containing an Excel Writer and migrated it to 4.5.0 and the settings of the node haven been preserved correctly. Is there any chance that you can you reproduce this issue (creating a 4.3.4 workflow, import it into 4.5.0) and provide this workflow?

Best regards
Lars

Looks like you right. Reset was a result of new file choice.

1 Like

Hi @laaaarsi

If I understand correctly you are looping over a list of paths (excel files with same structure) and you preconfigured your Excel reader for instance with “Select sheet index” = 1 and sometimes a sheet with index != 1 appears to be in the output?

I am using the Select sheet with name option. Occasionally, it reverts to the Select first sheet with data option, after which my workflow fails because the first worksheet usually does not have the same data structure as the other worksheets. I can see this because of the preview of data loaded in the configuration screen.

Thank you for sharing your workflow. It would be interesting if your test contained 3 sheets with a different data structure each time. Btw interesting way of generating files :slight_smile:

For now, I have implemented a string manipulation (variable) node just before the Excel reader, in which I specify the name of the worksheet. This variable is used to control the sheet_name parameter. I will test a bit to check whether the same issue arises or not.

Please note that I am currently on KNIME version 4.5.0.

1 Like

@Geo what you could do is force the use of sheet_selection = “NAME” thru a Flow Variable in case of doubt. Maybe not the most elegant way but shout provide an additional layer of security.

And indeed under certain circumstances it seems to revert to the default value. Not able to reproduce it yet in a constant way. But might be worth looking into, @laaaarsi.

2 Likes

Tried to reproduce the issue and I think managed it, at least partly. Closing the workflow in an executed state, reopen the workflow and open one of the Excel Reader leads to that “Select first sheet with data” is selected even though it was saved with “Select sheet with name”, closing (without saving) and reopen the dialog will result in the correct selection. This is definitely a problem, for which I will create a ticket, but what I did not manage to get this during execution.

Edit: Filed a ticket with internal number AP-18133

3 Likes

Hi @laaaarsi and @mlauber71 thank you for catching this! Calling a workflow from another workflow appears to leave the called workflow in an executed state, at least for a little while (which is at the same time convenient to check whether everything went right or to see what went wrong). I use an adapted version of the model factory.

1 Like

@laaaarsi could you add a problem with the Parquet Writer also? The settings for the use of File/Folder would also ‘snap’ back after a reset and saving of the workflow

image

You could force the use of Folder by setting a flow variable and the write seems to work anyway, So it could just be a display problem.

As a follow-up, even the workaround with the flow variables does not always avoid the behaviour. In other words, I’ll be looking forward for the bug fix when it will be available :slight_smile:

1 Like