Knime linear sampling

Hello. I am trying to find a paper and/or code related to linear sampling. Does anyone know about this? Thanks!

Hi @jsfrunner,

can I ask why you need this? The note explanation from the “Row Sampleling” note seems quite straight forward:

Linear sampling

This mode always includes the first and the last row and selects the remaining rows linearly over the whole table (e.g. every third row). This is useful to downsample a sorted column while maintaining minimum and maximum value."

Or do you need this for a science project and have to reference this method because you used it?

Thank you! I would like to see how people have found this sampling method to work compared to other methods. We have found it worked well with a data set characterized by a long-tailed distribution. I am curious if this is what others have found. Any references to code or published uses of it would be helpful in this regard.

Hi @jsfrunner

This is only true if your data is preliminary sorted by the variable (column) that you want to sample. If the data is previously i.i.d. shuffled, then it is not true and sampling would be equivalent to random sampling without favorizing far samples in tails.

The reason why you may notice than “linear sampling” is better to sample long tailed distributions, is because when data is sorted you give the same chance to samples far in tails than samples everywhere. The downside of such sampling is that it is not I.i.d. and hence it is biased. Moreover, it cannot be considered as random sampling and has as effect to change the underlying probability distribution law of your data.

Having said this, it may be still useful depending of what you need to do with the long-tailed sampled data.

Hope it helps.

Best Ael

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.