This is a typical predictive analytics problem with both categorical and numeric variables. It is a regression problem. We use RandomForest. At data exploration stage, we explore if it would benefit performance if one or more of numeric columns are discretized. To minimize uncertainity in the results, we loop over the partitioning, modeling, predicting and scoring multiple times. We then calculate confidence interval of mean RMSE.
This is a companion discussion topic for the original entry at https://kni.me/w/Pddz9fi5grMOG70A