Different Result between Partitioned Test Data and new Test Data

BerryB70 · September 28, 2022, 7:42pm

Hello fellow Knime users,

I have the following question:

I am using a flow for image classification (I will post a picture of the flows) and have found that I get different results depending on how the test data is added.

Using a large dataset and splitting it into test and training data I get very precise results.

However, when I use a new test data set I get 25% worse results.

Can anyone here explain to me why?

I tried training the models for longer time but it didn’t really help.

ScottF · October 14, 2022, 4:02pm

On the surface this sounds like a classic case of overfitting. Or possibly, your new data and your training data aren’t distributed the same way.

Using deep learning for this case might be overkill. Have you tried training a simpler algorithm, like a Random Forest, to see what the results are like?

system · January 12, 2023, 4:02pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.