Sorter node Niggle

Hi,

There is a rather annoying aspect to the Sorter node which I keep meaning to report.

If you sort in Ascending order on a column, any rows with Missing Values are brought to the top of the list. This is quite frustrating, as I would say in all cases, it is not what the user wants.

Surely any rows with missing values should be ignored during the sorting procedure, and therefore those rows with missing values are put at the end of the sorted list, regardless of whether ascending or descending is used.

If there is a reason why users would want missing values at the top, then please off a checkbox in the Sorter node configuration to be able to choose not to Sort on missing values.

 

Simon.

Are there any plans for the sorter node to better handle missing values please.

Simon.

Can anything be done to address the handling of missing value cells when being sorted with the sorter node.

 

Please!!

Simon.

Hi Simon,

Let me discuss this with the group. I see your point and I guess we will need to add an option to the sorter (be it just for backward compatibility reasons). I assume you would want to have the same option in the table view (that allows sorting by clicking the column header)?!

I'm not sure if we can add this for v2.6 already as it's low level API and the sorter is used in quite a lot of different places.

Cheers,
  Bernd
 

Thanks for bringing up with the group. I hope something can be done.

In order to retain backward compatibility, I guess a checkbox can be used which says "Move Missing Cells to end of sort list".

The same option in the table view would be most welcome also.

I can never think of a reason why a user would want empty cells listed at the start of a sorted list.

Thanks,

Simon.

Hi Simon,

this actually triggered quite a series of discussions internally. Bernd already has an implementation ready to go but we are rather uncertain if it's really a good idea to put this type of functionality into a sorter node (or a sorter button in a view). Moving missing values always to the end of a table (no matter which order one uses for the sorting of the non-missing rest) breaks the semantics of the sorting. In a way one is not sorting the missing values anymore but one filters them and puts them at the end. Sometimes they are smaller than everything, the other times they are larger but it's only implicitly defined.

From my perspective this is something one would rather see in a workflow as an explicit node modeling the "filter out missing values (and append to end of table)" using a node or maybe two. I am even less sure how to communicate this to a user inside a view. I do understand your use case, but I have also seen (and actually had them myself) cases where I was quite interested in the missing values: by knowing they are considered larger or smaller than all other possible values (which is not an alltogether unrealistic guess), I know they will show up on the top of the table if I sort either ascendingly or descendingly. So one or two clicks will get them. With your proposed setup I always need to scroll down to the end of the table. And I can already see posts on this forum where users complain about their missing values "disappearing" from the table view...

I could see us potentially adding this to the sorter node as additional options for missing value treatment. But I have no clue to how to (a) add that to the view as an option and (b) make it visually clear to the user what's happening to the missing values.

Now you know what's going inside KNIME on Easter Sunday :-)

Cheers, Michael

Thanks Michael, I am glad I have managed to give everyone an exciting Easter Sunday!

I'm glad you can see the use case for this request. Often the user wants to see sorted data at the end of the workflow to view the results, invariably there are so many properties the user is looking at that they then want to change the sorter node and now sort on a different property. Having to use the Missing Values node to filter out missing values would be very painful, as it would mean changing this node as well as the sorter node for every new resorting of data (due to missing data being in different rows per property column). A number of colleagues complain to me about having to keep scrolling past missing values, so I hope something is possible.

I can also understand your example that being able to see the missing values at the top of the table could also be useful so as to see what these are.

I am more than happy to have either a separate node or for the sorter node to have an option made available (my preferred option). The toggle option could read "Always Move Missing Values to Sorted List End".