Random Forest Regression Predict

jose_giron · November 28, 2023, 7:22pm

Hello how are you?
I worked in a flow that allows us to see the price-demand elasticity of the products of the company where I work. Use two paths to have the option of two different models, the first was a logarithmic regression and the second was a random forest regression.
The model that worked the best for my data set was the random forest regression, I am working with a history of units sold and prices from 2021 to date, however now the objective is for the model to predict how many units will be sold at the new price that I indicate and with this I can verify if that price would suit me or not. But I don’t know how to inform the model of that new price and using the random forest predictor to tell me the units that will be sold.

These are my predictions and my r2 went very well, now I just need the step of adding a new price and having it predict the units that are going to be sold.

Thank you for your help.

rfeigel · November 29, 2023, 2:51am

It appears that the only attribute (predictor) you’re using is price. Write your model out from the Learner node and feed it and a new set of price data to a new Predictor node. Is there any possible seasonality that a regression model won’t account for well?

jose_giron · November 29, 2023, 3:51pm

Hello! This is the final part of my model, I have already made the partition to train it and do the respective tests, before it only included the price and it went well, now I included everything else and it improved a lot. However, I don’t know how to make the model indicate how many units will be sold at a price of 250, for example. I want it to predict the units that will be sold at a new price with the information I am giving it.

They are images of the final part of the flow, the random forest learner (regression) and the table that returns the random forest predictor (regression). Now I want the model to predict how many units will be sold at a price of 250, for example.

Daniel_Weikert · November 29, 2023, 6:14pm

@jose_giron
You can save your model using Model writer node and load it into a new workflow. (Model reader)
To make predictions you need new data which has the same columns as your training data. So if your model predicts units then price is one of the features I guess so you have new data where the price column contains e.g. 250
br

rfeigel · November 29, 2023, 10:24pm

Why are using the year and week# as attributes?

jose_giron · November 29, 2023, 10:38pm

Sorry, the image is wrong, I took those two out and it only uses the number of restaurants and price.

jose_giron · November 29, 2023, 10:44pm

Thank you for your help and your comments. The model improved according to what I have been told, however, now I want it to predict how many units will be sold at a price of 250, I am going to create a new data set but obviously I will not be able to include the units sold, since that is what I want to predict , instead of having five columns my new data set would have four columns, correct?

Sorry, it’s my first machine learning.

rfeigel · November 30, 2023, 12:02am

You need a new data set with the same attribute columns as used in the model learner. In your case the number of restaurants and price. You don’t need a target column as this will be provided by the learned model. As both @Daniel_Weikert and I previously said write out the model with the a Model Writer node fed from the model port in the Model Learner. Then feed your new data to the data port of a new predictor node. Use a Model Reader node to read the learned model you previously saved and feed this to the model input port of the the new predictor. These three nodes are not connected to your original model and can be an entirely separate workflow.

mlauber71 · November 30, 2023, 5:52am

@jose_giron there are a lot of good resources out there to learn about machine learning with knime

jose_giron · December 8, 2023, 8:41pm

Thank you all for your help, it was very useful. Could I ask you why, even if I considerably increase or decrease the price, the quantity that the model predicts remains the same?

and this is the prediction with the new data set.

rfeigel · December 8, 2023, 10:17pm

You didn’t include the number of restaurants. Can you uplaod your worfklow(s) and data? Its hard to decipher the screenshots.

jose_giron · December 8, 2023, 10:37pm

Sorry, I’m working with a DB Connector and a DB Query Reader and the head of data governance would have to authorize me permissions to be able to view the data warehouse tables.

I don’t know if it’s too much to ask, but I could explain what I did in some virtual meeting just if you had the time.

rfeigel · December 8, 2023, 10:41pm

In the meantime add # restaurants to the second predictor node. Your screenshot shows you’re using a learner node. Reread my earlier instructions post.

jose_giron · December 8, 2023, 10:48pm

Thanks, this is my workflow:

This is the configuration of my learner node:

This is my predictor:

And this is my predictor with the new data set:

Sorry and thanks for your time.

mlauber71 · December 8, 2023, 11:18pm

@jose_giron may recommendation would be to think about what it is you want to predict and what data you have that might be relevant for thec task.

The the question is what method would be suitable. Just a regression model or is it some sort of simulation or is it the willingness to pay prices. These might require different settings and methods. Also some relations might not just be linear.

Then in the screenshot you seem just to use the same data for training and validation which is not a good setup.

If your business is dependent on seasons you might have to include this information. During holidays more people might visit (or not).

My recommendation would be to invest more in research understanding and planning.

What could help is find a dataset or Kaggle challenge that matches your case.

rfeigel · December 9, 2023, 12:07am

Can’t tell whether you’ve properly partitioned the original data set. @mlauber71 appears to be correct, i.e. you’re using the same data for training and testing, but without the complete workflow its hard to tell what’s going on. Study some models on the Hub to learn how to construct a proper model. Like @mlauber71 I’m skeptical that a linear regression without considering other relevant factors will produce useful results. Its dangerous to believe a model simply because it runs and seems to “make sense.”

system · March 8, 2024, 12:08am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.