Regression Model :monthly volume forecast model is not satisfied

Joey666 · March 1, 2024, 1:14pm

Hi Team,
I am trying to do monthly volume forecast by customer with kind of regression models by using historical monthly vol (2022-07 ~ 2023.12). The model test and predict result is: R^2 is 0.8-0.9 but MAPE is always 0.56-0.9. However, when i implemented the trained model with new data (2024-01), the predicted volume vs 2024-01 actual volume has quite huge gap, which is not acceptable.

I have preprocessed my historical data to get customer monthly volume, past one month, past 2 month, past 3 month till past 6 month vol, also get past 1 year vol. Besides, i sum holiday# per month as an impacted feature

I am not sure where i can improve and how i can improve my model. Can anyone have such experience to improve my model accuracy.

Here is my KNIME flow (trained data).

Train ML Model.knwf (137.3 KB)

mlauber71 · March 1, 2024, 2:43pm

@Joey666 you sample workflow does not contain any data. Maybe you can save it again with the data in a /data/ subfolder or when saving it uncheck the Reset option:

mlauber71 · March 1, 2024, 2:48pm

@Joey666 concerning regression Machine Learning tasks. I can offer this collection of nodes to compare models

Joey666 · March 2, 2024, 5:02am

Thanks a lot for your guidance ! I have reexported the workflow without reset. But when i am going to upload here, it is saying the flow is over it’s max size (4M), anything i can do to successfully upload to community?

Joey666 · March 2, 2024, 5:16am

Thanks for sharing this. i have question how i can save this workflow to my local space？ when i drag to my local workspace, seems the flow is not shown there still.

Also, regarding the partitioning, it is a must to use partitioning to split data to train& test data set in a ML model? can i use row splitter node to split test/train data if i have to choose some specific data as a test data (actually i managed to split exact same data with Row Splitter and Partitioning, but the scorer shows very strange result if i use row splitter to split data)

rfeigel · March 2, 2024, 4:44pm

To download a workflow from the Hub, click the link above and then the download button.

The downloaded file is knfw file (KNIME “zip” file.) You’ll need to install it.

Joey666 · March 4, 2024, 3:52am

Thanks a lot for your guidance！

Joey666 · March 4, 2024, 11:13am

Thanks for your guidance on downloading workflow. I have question about Data split for test and train data below. Could you pls help me ? Thanks so much

Also, regarding the partitioning, it is a must to use partitioning to split data to train& test data set in a ML model? can i use row splitter node to split test/train data if i have to choose some specific data as a test data (actually i managed to split exact same data with Row Splitter and Partitioning, but the scorer shows very strange result if i use row splitter to split data)
[/quote]

rfeigel · March 4, 2024, 3:45pm

I don’t know anything about your dataset, i.e. how balanced it is. I would recommend that you use the Partitioning node. Using the row splitter can result in unintended “data drift.” If your data is very unbalanced check out resources such as SMOTE to manage it.

mlauber71 · March 4, 2024, 4:00pm

You can use the KNIME Hub. Or you could see if the data as a KNIME .table would be smaller.

system · June 2, 2024, 4:00pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.