Random seed number in Partitioning node

Random seed in Knime partitioning node is “1639924778212” set by default in my case. When changing the seed number to “42”, which is normally used in train_test_split in sk-learn, the regression result is quite different. it becomes worse in my case.

Q1: how the seed number is set by default?
Q2. why the difference can be so big when change the seed number to 42?

“The purpose of the seed is to allow the user to ‘lock’ the pseudo-random number generator, to allow replicable analysis.” [source]

  1. The seed’s value is not meaningful. It is just to reproduce your results.

  2. The difference is large if your data is tiny or if your data has different distributions

2 Likes

thanks for the explanation. if the difference is due to data having different distribution (even after , any measures can be taken to improve?

the dataset I am using is in time sequence, random tree regression model (not time series model) is tried now.

Hello,

If the distribution is different, you will need to separate out the distributions into their own data sets.

If you are dealing with time series data, you may need to use a time series model. We have just written a book for this: https://www.amazon.com/Codeless-Time-Analysis-KNIME-implementing/dp/1803232064

2 Likes