When reading somewhat tabular excel files from sources out of my control it often happens that columns with the same value are merged. For example row 3-5 have the same value in column X. then these cells are merged. when reading only row 3 gets this value and 4+5 have a missing value.
Since this can happen over multiple columns and in more complex files, fixing this after the fact is not really a solution and the Excel reader should be able to deal (potentially via config options) for merged cells.
Thank you for posting about the issue you’re facing. May I ask which nodes you are using to read the tabular Excel files? It would also be helpful if you could provide a simple workflow that replicates the problem you’re encountering. I am unable to recreate the problem on my end.
That only works if missing value always come from merged cells. with is not given. It could really be missing/empty and hence why I think Excel reader must be able to deal with it. especially since merged cells is an aspect of the format which the reader just cover entirely.
NOTE Please make sure that you fully test your specific results, as I have only performed very limited testing using the sample xlsx in the demo workflow.
I have set the colour of the merged cells in the demo spreadsheet as follows:
In addition to merged cells, you will see that Sheet 2 contains two missing values and Sheet 3 contains one missing value. Sheet 1 contains an “overlap” between the cells shared by Mary and Fred. These missing values should remain missing, and not be given values from any other cells.
With the Un-Merge component, the Excel Readers read these in as:
Without the component, the Excel Readers read this as: