Hello KNIME Community,
I have noticed that the built-in Box Plot node renders the upper whisker at the theoretical fence (Q3 + 1.5 × IQR) rather than at the most extreme data point within that fence. According to Tukey’s original definition and the implementation in most statistical packages, the whisker should extend to the last observed value that is not flagged as an outlier—i.e., the highest data point ≤ Q3 + 1.5 × IQR—while any values beyond that threshold should be plotted individually as outliers.
Rendering the whisker directly at Q3 + 1.5 × IQR can be misleading when no actual data point coincides with that theoretical limit. It visually suggests the existence of data at that exact value, rather than accurately reflecting the distribution of observed values.
Would it be possible to update the Box Plot node so that:
- The upper and lower whiskers extend to the most extreme data points within [Q1 – 1.5 × IQR, Q3 + 1.5 × IQR].
- Any values outside these fences continue to be shown as separate outlier points.
This change would bring KNIME’s Box Plot node in line with Tukey’s definition and the behavior of other major analytics platforms, improving interpretability.
Thank you for considering this feedback. I appreciate the continual improvements to KNIME and look forward to any updates or workarounds you might suggest.
Hi @pinatofilho,
Welcome to the forum.
I think what you’re seeing matches the node docs. From the Box Plot node description:
The whiskers never exceed 1.5 × IQR. This means if there are some data points which exceed either Q1 − (1.5 × IQR) or Q3 + (1.5 × IQR) then the whiskers are drawn at exactly these ranges and the data points are drawn separately as outliers.
So in this node the whiskers stop at the 1.5×IQR fences, rather than extending to the most extreme in-fence value. I know different tools use different box-plot conventions, so it’s easy for this to be confusing. If Tukey-style whiskers (to the last in-fence point) are needed, one alternative is to construct the chart in the Generic ECharts View.
Best,
Keerthan
2 Likes
Hi @k10shetty1,
I don’t fully agree here. The description you cite is not very clear here. “never exceed” is not the same as “is always” 1.5 x IQR as observed by @pinatofilho. Also, while checking the link to the description, the sentence before is:
They are drawn at the minimum and the maximum value as horizontal bars and are connected with the box with a line.
I would interpret this as exactly the behaviour @pinatofilho is looking for. I didn’t check the node output myself to see what the current behaviour is, but would like to see the suggested and documented behaviour, rather than having always 1.5xIQR.
@daniela_digles Thanks for reporting this.
@pinatofilho You are right that the current Box Plot node and the Generic ECharts View node’s boxplot view do not exactly follow Tukey’s definition. At the moment, the whiskers are drawn at the calculated cutoff values (Q1 − 1.5×IQR, Q3 + 1.5×IQR) rather than being extended to the most extreme data points within those cutoffs.
I have created a bug ticket to track this issue: UIEXT-2926.
Best,
Keerthan
2 Likes