Regression Model :monthly volume forecast model is not satisfied

Hi Team,
I am trying to do monthly volume forecast by customer with kind of regression models by using historical monthly vol (2022-07 ~ 2023.12). The model test and predict result is: R^2 is 0.8-0.9 but MAPE is always 0.56-0.9. However, when i implemented the trained model with new data (2024-01), the predicted volume vs 2024-01 actual volume has quite huge gap, which is not acceptable.

I have preprocessed my historical data to get customer monthly volume, past one month, past 2 month, past 3 month till past 6 month vol, also get past 1 year vol. Besides, i sum holiday# per month as an impacted feature

I am not sure where i can improve and how i can improve my model. Can anyone have such experience to improve my model accuracy.

Here is my KNIME flow (trained data).


Train ML Model.knwf (137.3 KB)

@Joey666 you sample workflow does not contain any data. Maybe you can save it again with the data in a /data/ subfolder or when saving it uncheck the Reset option:

@Joey666 concerning regression Machine Learning tasks. I can offer this collection of nodes to compare models

Thanks a lot for your guidance ! I have reexported the workflow without reset. But when i am going to upload here, it is saying the flow is over itā€™s max size (4M), anything i can do to successfully upload to community?

Thanks for sharing this. i have question how i can save this workflow to my local spaceļ¼Ÿ when i drag to my local workspace, seems the flow is not shown there still.

Also, regarding the partitioning, it is a must to use partitioning to split data to train& test data set in a ML model? can i use row splitter node to split test/train data if i have to choose some specific data as a test data (actually i managed to split exact same data with Row Splitter and Partitioning, but the scorer shows very strange result if i use row splitter to split data)

To download a workflow from the Hub, click the link above and then the download button.


The downloaded file is knfw file (KNIME ā€œzipā€ file.) Youā€™ll need to install it.

2 Likes

Thanks a lot for your guidanceļ¼

Thanks for your guidance on downloading workflow. I have question about Data split for test and train data below. Could you pls help me ? Thanks so much

Also, regarding the partitioning, it is a must to use partitioning to split data to train& test data set in a ML model? can i use row splitter node to split test/train data if i have to choose some specific data as a test data (actually i managed to split exact same data with Row Splitter and Partitioning, but the scorer shows very strange result if i use row splitter to split data)
[/quote]

I donā€™t know anything about your dataset, i.e. how balanced it is. I would recommend that you use the Partitioning node. Using the row splitter can result in unintended ā€œdata drift.ā€ If your data is very unbalanced check out resources such as SMOTE to manage it.

1 Like

You can use the KNIME Hub. Or you could see if the data as a KNIME .table would be smaller.