H2O random Forest with imbalanced data

Hi @malik,

there are basically two ways.

The first way is to balance the data before converting it to an H2O data frame. You can find easily more information about this in the forum. See, e.g, Unbalanced data - good practice and SMOTE for further details.

The second way is the “Balance classes” option shown in your screenshot. By checking this option, H2O will automatically balance the classes. With the setting “Define max relative number of rows after balancing”, you can control how much balancing is done, see also here. If you set the number very high, the classes will be balanced equally no matter how skewed they are. The setting at the very bottom even allows you to manually specify the balancing factors but you can just leave it empty as it is and the factors will be calculated by H2O.

Hope this helps you,
Simon

8 Likes