Hey folks,
I’m relatively new to KNIME and this is my first forum post, so bear with me. I am attempting to assign quartile rankings to a dataset and I’m struggling to apply my desired logic. As a first step, I used a GroupBy node to calculate the following aggregated stats for my (numeric) dataset:
Minimum
0.25-quantile (Q1)
Median
0.75-quantile (Q3)
Max
I want to use these calculated values with the Binner (Dictionary) node to assign a Quartile to each row in my dataset, using the following bounds:
Quartile 1: [Min,Q1)
Quartile 2: [Q1, Median)
Quartile 3: [Median, Q3)
Quartile 4: [Q3, Max]
I’m encountering issues with quartile assignments when 1 or more bins have the same bounds. For example, consider the following sample dataset:
Data
1
1
1
1
5
10
Min: 1
0.25-quantile (Q1): 1
Median: 1
0.75-quantile (Q3): 6.25
Max: 10
Quartile 1: [Min,Q1) = [1,1)
Quartile 2: [Q1, Median) = [1,1)
Quartile 3: [Median, Q3) = [1,6.25)
Quartile 4: [Q3, Max] = [6.25,10]
Using the above quartile bounds, if my test value is 1, the Binner (Dictionary) node assigns Quartile 3. I assume this is because the Binner node tests each bin’s bounds sequentially, like so:
Is 1 within the bounds of Quartile 1? Yes => Assign “Quartile 1” as Quartile value
Is 1 within the bounds of Quartile 2? Yes => Reassign Quartile value to “Quartile 2”
Is 1 within the bounds of Quartile 3? Yes => Reassign Quartile value to “Quartile 3”
Is 1 within the bounds of Quartile 4? No => End
Instead, I want the Binner logic to stop as soon as it hits the first quartile which can be assigned (in the above example, this means that a test value of 1 should yield Quartile 1, not Quartile 3). Is this possible? Thank you for your help!