Hello KNIME Community,
I have noticed that the built-in Box Plot node renders the upper whisker at the theoretical fence (Q3 + 1.5 × IQR) rather than at the most extreme data point within that fence. According to Tukey’s original definition and the implementation in most statistical packages, the whisker should extend to the last observed value that is not flagged as an outlier—i.e., the highest data point ≤ Q3 + 1.5 × IQR—while any values beyond that threshold should be plotted individually as outliers.
Rendering the whisker directly at Q3 + 1.5 × IQR can be misleading when no actual data point coincides with that theoretical limit. It visually suggests the existence of data at that exact value, rather than accurately reflecting the distribution of observed values.
Would it be possible to update the Box Plot node so that:
- The upper and lower whiskers extend to the most extreme data points within [Q1 – 1.5 × IQR, Q3 + 1.5 × IQR].
- Any values outside these fences continue to be shown as separate outlier points.
This change would bring KNIME’s Box Plot node in line with Tukey’s definition and the behavior of other major analytics platforms, improving interpretability.
Thank you for considering this feedback. I appreciate the continual improvements to KNIME and look forward to any updates or workarounds you might suggest.
Hi @pinatofilho,
Welcome to the forum.
I think what you’re seeing matches the node docs. From the Box Plot node description:
The whiskers never exceed 1.5 × IQR. This means if there are some data points which exceed either Q1 − (1.5 × IQR) or Q3 + (1.5 × IQR) then the whiskers are drawn at exactly these ranges and the data points are drawn separately as outliers.
So in this node the whiskers stop at the 1.5×IQR fences, rather than extending to the most extreme in-fence value. I know different tools use different box-plot conventions, so it’s easy for this to be confusing. If Tukey-style whiskers (to the last in-fence point) are needed, one alternative is to construct the chart in the Generic ECharts View.
Best,
Keerthan
2 Likes
Hi @k10shetty1,
I don’t fully agree here. The description you cite is not very clear here. “never exceed” is not the same as “is always” 1.5 x IQR as observed by @pinatofilho. Also, while checking the link to the description, the sentence before is:
They are drawn at the minimum and the maximum value as horizontal bars and are connected with the box with a line.
I would interpret this as exactly the behaviour @pinatofilho is looking for. I didn’t check the node output myself to see what the current behaviour is, but would like to see the suggested and documented behaviour, rather than having always 1.5xIQR.
@daniela_digles Thanks for reporting this.
@pinatofilho You are right that the current Box Plot node and the Generic ECharts View node’s boxplot view do not exactly follow Tukey’s definition. At the moment, the whiskers are drawn at the calculated cutoff values (Q1 − 1.5×IQR, Q3 + 1.5×IQR) rather than being extended to the most extreme data points within those cutoffs.
I have created a bug ticket to track this issue: UIEXT-2926.
Best,
Keerthan
2 Likes
Hello everyone,
I was just about to start a new thread on this topic when I found this existing discussion. I’m glad to see it’s already on the team’s radar with an open ticket. I’d like to add my support to the previous posts and share a bit more detail on the conceptual issue.
From a statistical standpoint, the whiskers of a modified box plot should extend to the last actual data points within the calculated boundaries (e.g., Q1 - 1.5*IQR), not to the boundaries themselves, which are simply reference values.
The minimum and maximum represented by the whiskers, just like the outliers, should always correspond to real values from the dataset. The current implementation, which extends the whiskers to these calculated limits, effectively displays an artificial min/max. Since the k multiplier is an arbitrary parameter for outlier detection, this can lead to a misleading representation of the data’s true range (excluding outliers).
To illustrate this point, I’ve put together a workflow that compares three different box plot implementations. The difference in whisker rendering is immediately clear.
Hopefully, this additional information and the example workflow can be helpful for the development team as they review the ticket.
Thanks
3 Likes
Thank you @CarlosEnrique84 for the additional clarification! If you want to support the request, please also vote for the ticket (on the left side if you scroll up to the title).
1 Like
Hi Daniela,
Thank you for the heads-up about voting. Since the ticket has only two votes so far, and I understand the KNIME team may need to prioritize issues with higher community engagement, I thought I would share a quick survey of different tools and treat their consensus as an informal voting system.
For the standard Wikipedia dataset example, here is a list of implementations that all agree the whiskers should extend to the actual data points of 57 and 79 respectively (after omitting outliers with k=1.5):
-
Wikipedia itself: https://en.wikipedia.org/wiki/Box_plot#Example_with_outliers
-
Box Plot (JavaScript) (legacy) node in KNIME
-
Box Plot (legacy) node in KNIME
-
k-Box Plot component in KNIME
-
Statistics Kingdom: https://www.statskingdom.com/boxplot-maker.html
-
Alcula: https://www.alcula.com/calculators/statistics/box-plot/
-
Numiqo: https://numiqo.com/statistics-calculator/charts/create-boxplot
-
BoxPlotR: http://shiny.chemgrid.org/boxplotr/
-
Box Plot Calculator: https://boxplotcalculator.com/
The only implementation I found that defines the min/max whiskers by the theoretical limits of 52.5 and 88.5 (using k=1.5) is the standard Box Plot node in KNIME.
The conclusion is quite clear: the Box Plot node does not follow the traditional convention for rendering whiskers in a modified box plot. While other implementations that match the Box Plot node’s method might exist, this survey suggests they are likely a small minority.
Interestingly, even asking K-AI inside a Python Script node to write a script for this task produces code that correctly identifies 57 and 79 as the whisker ends.
I hope this additional context is helpful for the conversation and the team’s review of the ticket.
Cheers,
2 Likes
Thank you for the detailed overview! Given that the behaviour is even different to the “old” KNIME nodes, I think we can assume it is a bug, rather than a feature. So hopefully, we don’t need too many votes to still have this implemented. Adding @DanielBog to comment.
2 Likes