Why are recall for tests and train are diff in each case?

Attached are 3 KNIME workflows exported from my workspace with the reset option unchecked. All of them are tuned random forests with same split creation, tree dept, min node leaves, and max models = 100. Why are recall for tests and train are diff in each case? It should be same I think . Which model is best, and why?

Thanks in advance.

Please help with the doubt.
Hotel_Booking_Random_Forest_VP.knwf (1.1 MB)
Hotel_Booking_Random_Forest_VP_2.knwf (1.1 MB)
Hotel_Booking_Cancellation_Prediction_Tunned_Random_Forest_VP.knwf (1.1 MB)
Comparison of performance.docx (53.1 KB)

Hi @booramaravind -

I didn’t do a deep dive here, but I suspect it’s because you are not holding the seed constant in the Partitioning node. Thus you are actually using a different training split in each workflow.

2023-08-08 09_11_47

2 Likes

The “best” model is probably the one which scores highest on your evaluation metric and that metric might depend on your goal
br

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.