Hi all,
I am working with a dataset from which I need to sample every now and then. I normally use the node "row sampling" and choose the option of "stratified sampling" to get my samples. I have been recently reading and found out that there are 2 different approaches that gives a sample under the term "stratified sampling" (from Wikipedia).
One of them depends on the "sampling fraction" where the size of samples (assume it is sampling from 2 groups, A and B) have to reflect the original distribution, While the other one depends on the standard devation, making stratums that are similar (or close) in standard deviation to the original data. My question here is which one of these 2 approaches does the "row sampling" node use in the "stratified sampling" option?
Many thanks,
Error404