I'd like to undersample the majority class before SMOTING, but I can't find a node that undersamples at a percentage level. I know that EqualSizeSampling exists, but I don't want equal size here, but I'd like to change the percentage of the majority class.
For example, if a majority class is the 80% of the dataset and the minority is the 20%, I'd like to make the majority class the 60% of the dataset and the minority the 40%.
I have solved the problem using a sorter (sorted the target class) followed by a row sampling
Also see my suggestion on your topic regarding cbo sampling.
My guess is that you’ll have to build your own sampling algorithm using the available nodes (optionally you can organize them into a metanode). For example, using group by, you can calculate the relative frequency of each class. Then derive the sampling rate from that for each class using e.g. Math node or rule engine (whatever fits). Convert this table into a flow variable (e.g row to flow var). Finally, create a group loop with your microdata as input and with a sampling node inside (you could even choose the sampling node type with case node, etc.), connect the flow variable to the loop (that’s a bit of a fiddling part) and make the sampling rate depend on the flow variable. Maybe its necessary to use two loop start nodes (1 for microdata, another for sampling rates) and to connect them between each other - not sure, as i haven’t tried it myself.
At least something to explore