Feature Request: Excel reader handling of merged cells

When reading somewhat tabular excel files from sources out of my control it often happens that columns with the same value are merged. For example row 3-5 have the same value in column X. then these cells are merged. when reading only row 3 gets this value and 4+5 have a missing value.

Since this can happen over multiple columns and in more complex files, fixing this after the fact is not really a solution and the Excel reader should be able to deal (potentially via config options) for merged cells.

Dear kienerj,

Thank you for posting about the issue you’re facing. May I ask which nodes you are using to read the tabular Excel files? It would also be helpful if you could provide a simple workflow that replicates the problem you’re encountering. I am unable to recreate the problem on my end.

Thank you and have a great day!

Best regards,
Jinwei

A trivial example can be used to explain. i use the Excel Reader node.

image

The expectation would be that the missing values in Column A are “1” instead of missing.

@kienerj you could use the missing value node to fill missing rows with the values from rows above

1 Like

That only works if missing value always come from merged cells. with is not given. It could really be missing/empty and hence why I think Excel reader must be able to deal with it. especially since merged cells is an aspect of the format which the reader just cover entirely.

Hi @kienerj,

Thank you for reporting this issue!
I created a ticket for your suggested feature. (AP-20519)

Hi @kienerj,

Until such a feature is added to KNIME, you may wish to try out a component I’ve just added to the hub:

It has two config parameters:

  1. the location of a spreadsheet xlsx file that contains merged cells.
  2. the output location for an “unmerged” copy to be created.

It uses java to find all merged cells, and unmerges them (placing the “shared” value into each unmerged cell). This can then be read by Excel Readers or other processes.

A demonstration workflow is available here:

NOTE Please make sure that you fully test your specific results, as I have only performed very limited testing using the sample xlsx in the demo workflow.

I have set the colour of the merged cells in the demo spreadsheet as follows:

Sheet 1
image

Sheet 2
image

Sheet 3
image

In addition to merged cells, you will see that Sheet 2 contains two missing values and Sheet 3 contains one missing value. Sheet 1 contains an “overlap” between the cells shared by Mary and Fred. These missing values should remain missing, and not be given values from any other cells.

With the Un-Merge component, the Excel Readers read these in as:

image

image

image

Without the component, the Excel Readers read this as:

image

image

image

2 Likes