Excel Reader - How to retain empty cells?

LB_Knime · February 13, 2023, 10:08pm

I am using the Excel Reader node to bring in an Excel file.
Some rows have empty cells eg:
(pipe symbols are just for illustration here)

             Col_0 | Col_1 | Col_2 | Col_3 | Col_4
Row 0      A0  |    B0   |     C0  |     D0  |     E0
Row 1      A1  |    B1   |            |            |     E1
Row 2      A2  |    B2   |     C2  |     D2  |     E2

So, with the above Excel data, in Knime, I get the following:

             Col_0 | Col_1 | Col_2 | Col_3 | Col_4
Row 0      A0  |    B0   |     C0  |     D0  |     E0
Row 1      A1  |    B1   |     E1  |       ?    |      ?
Row 2      A2  |    B2   |     C2  |     D2  |     E2

The value E1 is being shift from Col_4 to Col_2.

I want to retain the original structure with the blank cells in Col_2 & Col_3.

According to the node documentation, there are two setting that might be involved:

Skip empty columns says:
“If checked, empty columns of the sheet will be skipped and not displayed in the output. Whether a column is considered empty depends on the Table specification settings. This means that if the cells of a column for all scanned rows were empty, the column is considered empty,”
I added emphasis on “all scanned rows”. My data definitely does not have missing data in the whole column or even all the scanned rows.

Replace empty strings with missing values
“If checked, empty strings (i.e. strings with only whitespaces) are replaced with missing values.”

I have checked the "Replace empty strings… " option.

How do I get Knime to stop shifting the right-most cell value to the left when there is an empty cell?

Thanks.

ArjenEX · February 13, 2023, 10:38pm

Hi @LB_Knime

Can you show a little bit more of your configuration of the Excel Reader or even better an example workflow? If I create an Excel with your example data set, drag and drop it into a workflow it all shows fine. Meaning this is executed with default settings only.

LB_Knime · February 13, 2023, 10:41pm

You can see in the “File Content” area how my data is treated.
“Declined” should be in Col G

LB_Knime · February 13, 2023, 10:43pm

Not sure if this is related - in the same config, under “Preview”, I get this error message, though it doesn’t prevent the workflow from executing:

ArjenEX · February 13, 2023, 10:57pm

That error shouldn’t be there. Have you tried disabling the Limit data rows scanned (Advanced Settings)?

LB_Knime · February 13, 2023, 11:07pm

Disabling the Limit data rows doesn’t change either issue.

ArjenEX · February 14, 2023, 2:02pm

Then I personally don’t see any other option then to share an anonymized version of your Excel file, or have KNIME staff look into this. The error message is quite rate.

LB_Knime · February 14, 2023, 2:36pm

Thanks, Arjen. Struggling to find instructions for uploading a file here. Can you help with that, too ?

ArjenEX · February 14, 2023, 2:45pm

Drag and drop it from your file explorer into the post or use the upload button.

I believe there is a 5MB size limitation.

LB_Knime · February 14, 2023, 3:10pm

Thanks. Was looking for a paper clip.
Sample source file.xlsx (3.0 MB)

ArjenEX · February 14, 2023, 3:21pm

Unfortunately I have to report the same. In version 4.6.4 it loads fine with just default settings. Someone else has to jump in.

iCFO · February 14, 2023, 3:21pm

excel_read_columns.knwf (1.2 MB)

I don’t see any issues using all default settings on my end… See if it reads correctly if you open my workflow, then point it to your local source file.

LB_Knime · February 14, 2023, 3:31pm

Thanks for the feedback.
It seems it works fine with the test file I sent, but not with the original file.
The original file is an xlsx created by our Enquire software reporting tool.
To get the sample file, I opened the original in Excel, changed the data in the first column and then used ‘Save As’. I suppose something changed there.
Maybe there is a delimiter issue? The Enquire software only allows export as xlsx. It isn’t even a csv here.

iCFO · February 14, 2023, 3:46pm

It sounds like a conversion problem on the Enquire side. Can you remove proprietary fields and then do an export from your Enquire software? That way we can see if there is a way to work around the issue in KNIME.

mlauber71 · February 14, 2023, 4:39pm

@LB_Knime what you could so is try and use the bundled Python version and the integrated OpenPyxl and see if they can import the file

Is there any chance to get an “original” output file from this system without spelling any secrets?

LB_Knime · February 14, 2023, 5:44pm

Phew - the Openpyxl workflow/node looks too complicated for me!
I’m checking to see if I can share the original file.

iCFO · February 14, 2023, 8:45pm

@LB_Knime

excel_read_columns.knwf (6.6 KB)

Go to advanced settings and select “Reevaluate Formulas”

LB_Knime · February 14, 2023, 8:59pm

Hey, looks like that fixed it! Do I have to leave that checked forever?
Any idea that is necessary?

iCFO · February 14, 2023, 9:17pm

This is the first time that I have encountered an issue like this, so I don’t have a detailed answer as to the exact cause. However, when I scrolled to the bottom row it showed that filters were applied even though they were not visible on the top row (as they were in the other export). I am guessing that these filters (or an export error in the attempt of creating a filter) may have caused the issue that required formula reevaluation. Yes, I would probably check this setting in the future for exports from this system to be safe.

LB_Knime · February 14, 2023, 9:21pm

Thanks again to everyone for helping with this.
I did notice the garbage text at the bottom, but didn’t think to check both reports for differences.
I think it’s terrible practice for applications to append text like that to an export. The software can’t even handle exporting the whole report at once, which is why it’s broken into two pieces. I’ll double check that they are built identically.