I noticed how the specs of a column after Row Filter by e.g. a number value didn’t change as expected/intended.
Table with Column Int values, let’s say. 1 - 3000. Filter Row by max value 250.
You (well, I at least) would expect the new spec to be 1 - 250. But it still is 1 - 3000.
I vaguely remember that Knime uses wrappers and all kinds of stuff where I imagine stuff to be present that I don’t see. Though in this case it doesn’t feel intuitive.
I tested the Cache node, no difference. Had to duplicate the column, then it was ok. Use case for me was the ColorManager that follows which would give me an unwanted color range.
So, my q is, is this intentional behavior?
The column domain (which is what you are referring to) isn’t recalcualted on every node - I assume because it takes time/processor power/memory to do. Most of the time that’s not much of an issue but for filters/splitters it is annoying. You can force the recalculation using the Domain Calculator node:
Incidentally, if you find the default 60 values for String column domains frustrating then there is a fix here:
Interesting vote; before I vote, is there a reason for this be a feature that I don’t see? Then there is perhaps no need to change this.
There are work-arounds as described above (I for example duplicate the column), so it is more a question of being sufficiently aware of this?
well, think about it like a “data lineage”, you can “see” the original range of values before any kind of manipulation, should be useful in some cases. Whenever you need to “see” the actual range, no problem, use a domain calculator.
But… rethinking about it… maybe this implementation is counterintuitive, i mean, i want to see the actual range of values indipendently of the original ones without being forced to “refresh” the range with a domain calculator… mmmm… the default behaviour should be actual, with the possibility of a pseudo domain calculator to see the original range
It was a design decision to not change the domain of a column when you filter out rows. So the contract is: There is no value in the column that violates the domain information. However, the domain may be too ‘wide’ and may contain more values than present in the column or have bounds that are smaller/larger than the minimum/maximum found in the column.
One of the use cases is… a predictor node spits out class probabilities and sets the range in the column domain to [0, 1], though the actual probabilities for the (current) data are never 0 or 1. In a downstream scatterplot node we would use the domain value to initialize the axis range and plot the data over the possible domain, rather than the actual range of the data (e.g. [0.3, 0.7]). If passing the data through a filter node would break the domain we would not be able to support this case.
Put differently: It’s very easy to determine a strict domain (see above, Domain Calculator) but it’s hard to restore the original domain.