I watched the presentation on youtube by about these new node. Interestingly enough like the example Greg made, my interest would be around molecules. Anyway I pretty much took the example workflow with some minor adjustments and my own dataset (binary classification).
I played around with the results a bit as I’m not used to the new metrics and have no intuition about them. I tried a error rate of 0.05 and then wanted to verify if that error rate is actual true. There were 0 null predictions, all predictions with both labels means no error so all one needs to look at are the single-label predictions. And that is where I found a concerning issue, at least for my use-case. Simply said basically all of the error is on the minority class which in terms of molecules is almost always the class of interest. The total error was actually 7% (I’m ok with that small difference) but for the minority class this meant a 43% error rate.
Not sure what my actual question is but I considered that an important observation which was not intuitive to me. One really needs to remind oneself that the error is on the whole prediction set but within it is not uniformly distributed.
My question is more what I make of this, What would actually be interesting in the terms of molecules (eg making a suggestion to chemist what to make in lab) is being able to have a reliable error rate on the class of interest, eg. like precision but in this more statistical robust way and not just calculated from CV. Or what am I missing?
Hello @beginner. Is it possible to take a look at your workflow, or at least at some results that you have and data set classes distribution?
It is a bit hard to understand you actual question, but as far as I understood you would like to understand is what kind of value you can get from estimations done with conformal prediction. If so, then you can try to consider this error threshold as some kind of a trade-off between the desired accuracy and prediction certainty. If you set up very low error threshold (0.05 or less) it means that you may get very high level of uncertainty of the prediction (you will get more multi-class predictions), but you might be sure that all the samples belong to applicability domain.
Or vice versa, if you set very loose threshold (e.g. 0.3 or more) you can get very certain predictions (most of the predictions would be single-class), however the algorithm might point you some anomalous samples (null predictions), that might be out of domain of your samples. Which means that it might be another class.
I really hope my answer would be helpful for you. Please feel free to ask any other questions in this thread.
Best regards, Artem.
Hi Artem, you are right in that I wasn’t really asking a very good question.
Right now I would say somewhere rather obvious in the documentation it should state, that there is no guarantee how the error is distributed over all the classes. That like in my case with unbalanced data most of the error can actually be limited to one class, usually the minority class which is usually is the class of interest.
This is probably clear for you but it wasn’t for me and might not be for other users. In fact the Conformal scorer should output the actual error rate and the error rate per class.
The risk I see is that someone overlooks this issue and simply presents users with predictions from just one class, the class of interest. Simplified you want to suggest chemists which molecules to make and you would only suggest the ones with a “positive” prediction and tell them there is a 0.05 error rate while in fact for the ones you suggest the error rate is more like 0.5. That will be a good way to destroy all the trust in models /ML.
So again, I really thing the Conformal Scorer should output the total error rate (which in my case was 0.07 vs the set 0.05) and the error rate per class.
And having said that, if the Conformal Scorer gives me a 0.5 error rate for the minority class, is there are guarantee that this also applies to the predictions? Eg. can there be made any guarantees for the class of interest?