i’ve spend the last 6 months developing a workflow for predicting patient no-shows for a clinic in Denmark with pretty good success. I’ve now encountered an explaination issue with my results to a doctor who sees everything from a diagnostics point of view.
After looking at my results and converting them into a 2x2 tabel for truepositive he is wondering what happened to 50 % of the test-data which I’m arguing is used by the algorithm on two the two scenarios, which he doesn’t understand.
number of observations after leaving out bad data: 2440
Number of observations for training data (random draw) : 1952
Number of observations for test data (random draw) : 488
Why is it using 50 % of the observations for RowID 0 and 50 % for RowID 1? can someone please alloborate or help me explain this in a different way?