Collection handling with missing values

This is a feature that is missing in the Split Collection Column node.

We could choose to include missing values in GroupBy node to create a list column;
We could choose to combine missing values in the collection with other such columns by Column Aggregator node;
But we could not directly split the missing values included in the collection by using the Split Collection Column node. There is no such options in the node configuration.

I think it would be a really nice update if this feature can be added. :yum:

Thank you for your great efforts as always! I really enjoy using KNIME.

Hi @haozi04 ,

interesting idea and good timing, since we are going to touch this for adding a modern dialog sooner or later. However, I am not sure I understand your suggestion correctly. Can you add a small example of what the node does today and what the alternative output would look like?

Thank you,
nan

I’m really sorry. It seems that I misunderstood the Split Collection Column node - which does work by splitting missing values into separate columns by default already.

However I found that the Column Aggregator node may not be behaving like I thought.

If there are two or more collection columns, and I want to combine them as one collection column which includes all the elements in the columns, I use the Column Aggregator node.

If there are missing values in the collections, then the aggregator will not keep them. This does not change whether I have ticked to include missing cells or not. Only if I use List as the aggregator method could I have the missing values included as nested collections - which is not desired.

Please see the test workflow here: Test_collection_aggregation – KNIME Community Hub

Thanks for clarifying and the detailed description. I was able to reproduce the behavior you described. I extended an existing ticket (internal reference AP-12262) to also cover this case.
As a workaround, you can use the Split Collection Column – KNIME Community Hub node once for each of the two input columns. Then use a Column Aggregator selecting all columns matching the wildcard pattern “Split Value*” and the List aggregation method.

Hope this helps,
nan

2 Likes

Thank you for the advise and reporting the issue. :slightly_smiling_face:

I’ve converted missing values into temporary strings and change them back after split which also works to me.