I have used the H2O model to predict the difficulty of intubation during surgery. In that workflow I have optimised the parameters of the model and also I have used nodes referring to the cross-validation method.
The “Loop End” node collects the accuracy statistics for every loop. After, the GroupBy node calculates the mean of accuracy and the mean of sensibility. After that, the “Parameter Optimisation Loop End” registers all the mean accuracy calculated for each set of parameters.
The binomial scorer lets me get the accuracy and the sensibility depending on the target class I decide to have in my target column. In our case, our main goal is to predict if a patient is easy or difficult to be intubated. Our focus is to have a high sensibility for easy patients, but also to increase the specificity for difficult patients as much as possible. In the binomial scorer I can choose the target class (difficult or easy). If I choose “difficult” the model performs very good results for difficult patients but the sensibility for easy patients is very low (so that’s not what we would like to acheive). However, if I choose “easy” as the target class, then the model classifies all the patients as “easy”. When the model returns the global accuracy of the model, the accuracy is very high. That is because there are more “easy” patients than “difficult” ones, but actually the model performance is not good.
That’s why I thought that it is a better idea to choose “difficult” patients as the target column, but then the “Parameter Optimisation Loop End” should maximise the sensibility for “easy” patients (which is our focus). The problem is that if I choose “difficult” as my target class I cannot get the sensibility for “easy” patients in the accuracy statistics. I would like to get the sensibility, specificity and accuracy, because I think that missing one of them does not give an overall performance of the model.