new method for sampling

hi every one.....

i have asked to preform new method of sampleing according to below:

The over sampling, under sampling and imbalanced sampling have been ignored. Balanced
stratified sampling is selecting the samples from m strata, but of equal size. If there are minority
or majority classes then over-sampling or under-sampling respectively need to be performed as
required based on the distribution of the classes. In this paper the first option of balancing is
through ignoring the minority classes for the ratios above 1:100. As the response variable is of
multivariate in nature, ignoring the minority class leads to minimal error. The second method
adopted in this paper is to take equal size of samples from each strata by reducing the stratum to
p<m, such that p changing as the size of the sub-sample changes gradually from 500 to 30000.
The balancing criteria has been maintained in an excel file with the required size (i.e from 500–
30000) and that file has been given as input for the already built-in stratified model. As the
survival attribute was in binary form, it was easy to pick the sample representatives. The second
option of equiv-width has been used. For minority multivariate classes like stage and metastasis
the ratio of 1:100 has been verified using excel filter and this was given as an input to the
rapidminer for classification. This method is a fixed allocation of balanced classes in the literature
of probability and statistics.

i have two problem: first  honestly, i don not get the idea (probably  bcoz of  waekness in my english) and second how can i do this kind of sampleing in knime?

Have you had a look at the Row Sampling node yet?  It has a stratified sampling option which sounds like what you describe.  

 

That paper you highlighted (in an email) appears to be doing equal class size sampling. They do some filtering around this to remove classes that are to poorly represented. They vary the sample size.

I would say you just need a combination of equal size sampling and then row sampling. Equal size sampling will given you an initial equal set and you can then randomly sample from this set to get your final sample which should be roughly balanced. If you want to remove classes (but you then can't predict for them!) according to a low representation then you can follow the instructions I noticed in your other posts to identify these. 

The wording is a bit ambigious though so maybe I just didn't follow. 

 

thx....

what do u think this means:

"The second method
adopted in this paper is to take equal size of samples from each strata by reducing the stratum to
p<m, such that p changing as the size of the sub-sample changes gradually from 500 to 30000."

i do not understand how to reduce the startum to p<m......