Different Result between Partitioned Test Data and new Test Data

Hello fellow Knime users,

I have the following question:

I am using a flow for image classification (I will post a picture of the flows) and have found that I get different results depending on how the test data is added.

Using a large dataset and splitting it into test and training data I get very precise results.

However, when I use a new test data set I get 25% worse results.

Can anyone here explain to me why?

I tried training the models for longer time but it didn’t really help.

On the surface this sounds like a classic case of overfitting. Or possibly, your new data and your training data aren’t distributed the same way.

Using deep learning for this case might be overkill. Have you tried training a simpler algorithm, like a Random Forest, to see what the results are like?

1 Like