Issues representing missing values with Parallel Coordinates Plot nodes

Dear KNIMErs and KNIME lovers,

I need to visualise data using a parallel coordinates plot. I really like this type of plot because it allows us to compare variable trends at a glance. I also appreciate how KNIME has implemented this plot; however, unfortunately, it seems to have some issues when handling missing values. I will refer to the following data table to describe the problem, and you can also check the attached example workflow at the bottom for reference.

Currently, KNIME offers two nodes for this kind of visualisation:

  • Parallel Coordinates Plot

  • Parallel Coordinates Plot (JavaScript)

The Parallel Coordinates Plot node is a very nice implementation, but it only represents missing values correctly if they appear in the last columns, for example, var_04 and var_05 for item_02; var_05 for item_04. In this cases the line is correctly not represented in these cases (see figure below).
If missing values appear between columns containing non-missing values, sometimes they are represented correctly (var_02 and var_03 for item_01; var_03 for item_04), but in other cases, they are not (var_01 and var_02 for item_03: in this case, it appears item_03 has var_01 and var_02 defined but that’s not true!) (see figure below).

Using the Parallel Coordinates Plot (JavaScript) node with the “show missing values” option works better for reporting missing values correctly; in fact, all missing values are explicitly represented as missing (see figure below).

However, unlike the standard Parallel Coordinates Plot node, the JavaScript version does not display any axis values when only one value is present for a variable and this can be problematic (see figure below).

Does anyone have suggestions on how we can properly represent missing values in a parallel coordinates plot within KNIME, especially considering cases where only one value may exist for a variable?

I appreciate any suggestions or workarounds you might have on this matter.

Thank you in advance!

Gio

parallel_coordinates_plot_missing_problem.knwf (21.4 KB)

1 Like

Hi @gcincilla

I have unticked the Item column because this attribute is already carried through the Color Manager, like this →

I think this solves the issue of showing var_01 and var_02 for item_03.
However, for item_04, it still shows var_03 = 0.7, even though that value doesn’t actually exist.

Here’s my take on the problem — it’s another way of explaining the data without running into missing value issues.

Best,

Alpay Zeybek

Hi Alpay,

Thank you for your help with this! Yes, the way you configured the node doesn’t include “item” on the plot, which resolves the issue of showing var_01 and var_02 for item_03—since now the first variable displayed is var_01 instead of item. However, the problem still persists overall.

What I don’t understand is why, in the case of item_01, the missing values for var_02 and var_03 are correctly displayed, but this doesn’t happen in the other cases. Maybe someone else can provide some insight on this.

1 Like

Hi @gcincilla ,

it looks like you ran into two bugs at the same time.

The first one is the new Parallel Coordinates Plot failing to show missing values for numeric dimensions as a separate value outside of/next to the axis of valid values. The JavaScript view does this correctly. I opened a ticket for this one (internal reference UIEXT-3043).

The second problem is a bit more tricky to explain and is caused by the curve interpolation used in both plots. It will be easiest, if you switch to straight lines to see the difference. I’ll still try to explain here. The type of splines used for interpolating the curved lines regularly leads to overshooting beyond the actual values of both neighboring axes. For example, for your item_02 the line between var_01 and var_02 goes further up than the ends of both axes. I suspect that this kind of interpolation was chosen to have smooth curvy turns at the axes. However this kind of interpolation easily leads to false impressions, especially when combined with lack of treating missing values explicitly. Other options for interpolation exist that will lead to sharper turns at the axes, but should improve the overall picture. We need to figure out whether we can change the interpolation behavior in the same ticket.

In the meantime, I hope the alternative representation suggested by @alpayzeybek works for your use case.

Thank you for reporting,

nan

3 Likes

Hi nan,

Thanks for your reply and for confirming that this is a bug in the Parallel Coordinates Plot node. I appreciate you opening a ticket for it. Could you please update this thread once the bug is resolved?
I didn’t understand your description of the second problem but if I switch to the straight line option in the nodes, I don’t see any improvement.

Regarding the workaround suggested by @alpayzeybek, unfortunately, that’s not an option for me in this case. While in the example all variables share the same scale (0–1), I need to use this in cases where variables have different scales, so a bar plot grouped by item is not suitable. Normalising the variables is also not an option since the actual values and their units are important in my context. In a way, all the variables are quite different, and the values for the items can only be compared across the same variable.

At this point, I believe the best workaround is to use the Parallel Coordinates Plot (JavaScript) node and add an artificial dummy item so that each variable has at least two values, allowing the axis values to be displayed correctly, even if I don’t like very much this solution. I’m really looking forward to the bug fix! :wink:

Thank you both for your help!
Gio

1 Like

Yes, there will be a notification in this thread once the ticket is resolved.

I’m glad you found at least some way to proceed for now,

nan

2 Likes