Unsupervised machine learning (gradient boosting regression)

Hello everyone,

I’m currently doing a workflow to predict a sale coefficient of the next month. But there is a problem that i’m facing.

So to make it simple my data set has these features :

  • Region
  • business sector
  • ID client
  • Invoice
  • Coefficient first month
  • Coefficient second month
  • Coefficient third month

A coefficient is a numeric value so i use unsupervised algorithm to predict coefficient of the next month.
I train my data set to predict the third month based on the other coefficient but there is a problem.
Now that i know my model is working well, how can i predict the fourth month ? Because the learner node ask for a target column to predict but i would like the next month so i don’t have the target column yet.

Do you see my point ?

Thank you for your help.

You might want to think about your setting and how your data setting represents your business question, especially what information would be available at the time of the prediction, and how long you would want your model to work.

You might benefit from this debate and the accompanying example workflow.

BTW: If you use a target (your coefficient) it would be called a supervised learning. And to predict a new month you would use a predictor not a learner.

A very basic example could be seen here:

2 Likes

Thank you for your response and the example that you gave me. Unfortunately, i’m not seeing the link with my example and i don’t see how can i proceed on knime to predict the coefficient of next month.
i though about time series but it’s difficult to understand and no example that i saw fit with my example.

In the example the employment level from 2019 is predicted using data from the years before. Your description sounded like that might be a similar question (coefficient of third month) and also in the example a Gradient Boosted Tree (Regression) is used.

But maybe you provide us with a more detailed description of what you want to do or already have, since I you asked how to apply a model you seemingly have developed.

Maybe if you cannot share data or a workflow you could give us a screenshot or you create a small dataset that represents your problem without giving away any secrets. And also you could point us to an article or example that uses the kind of model you described.

Good examples of Time series predictions are not so easy to come by. Especially if you want to include more features like attributes to them.

1 Like

Hello mlauber71,

Thanks you for your answer.
Indeed, you guessed it i’m not allowed to share my data but i post a screenshot of my workflow to explain more precisely my problem. I have some questions reminding despit the example youy gave me ( by the way, thanks for the example).

Here is my workflow

So like i said i’ve got my segmentation, my coefficient of January,February, march and april.
I tried to predict april because i had the data so i could compare the real data with the predicted data to see the accuracy of the model. But now, I’m wondering first, How do I predict May ? In the workflow that i made I targeted April but when i use the predictor, it gives me the prediction on april (target feature) isn’t it ? And second, Is it necessary that i use time series when time is a parameter in my data ?

Well we are in January. So my manager want me to predict the data of February. So how do i do that on knime. To give you a description of this coefficient. It’s in fact the coefficient of sale which is the ratio of what cost the production of the product and which price do we sale it to the customer. (price is different from a customer to another and so is the cost of production)

Thank you very much for your help.

Adrien.

I still think you might benefit from studying and trying the mentioned sample workflows and the accompanying debate.

That being said: a model always predicts what input it got. So if your target is April it will predict April. That is why the data preparation went along like:

March becomes month_0
February month_1 (minus 1)
January month_2 (minus 2)

and so on. So your target would be month_0. In this example, if you now have new data March becomes month_1, February month_2 and so on. Then the prediction would be April (the new month=0).

Of course, this construction would rest on certain assumptions, especially that there is no strong seasonality since the model does not know that April ist April. If there is a seasonality you would have to tell the model that and you would need data from a previous April so the model would have a chance to assess what an April looks like in comparison to other months. In such a setting data like holidays or vacations would play a role (and think about moving holidays lik easter).

Maybe you tell us more about your data and what you want to accomplish.

1 Like

Thank you for your time and for your answer to my first question. I also think that your workflow is way too much complicated for what i want and i didn’t really understand your metodology about how you predict the next month.

Well, i’m gonna try to describe you the best i can my dataset.

The columns are :

  • business line
  • Region
  • Country
  • City
  • ID customer
  • invoicing
  • invoicing of past month
  • invoicing of two month ago
  • sales coefficient
  • sales coefficient of past month
  • sales coefficient of two month ago
  • sales coefficient of Three month ago
    …etc

I have got the sales coeffcient of the 2018 and 2019. We are in january and the sales coefficient of this month are up in 10 days.
I would like to make a workflow that predict these january sales coefficient before knowing them. Is it possible on knime ?
Maybe I didn’t understand ur workflow that maybe did that but i’m totaly not confortable with the variable port and i didnt suceed to run your workflow until the end.

One more thing i want to add for you to understand what i really want :

That’s good to train your data to a month where you know the value because you can compare the precision of your model but this model is useless if you can’t predict the future month without having the data right ?

I set up an example that should look like your data (if not feel free to adapt and inform). We have two years of data (from one city) and data from January and want to predict the coefficient for Feb 2020.

2019 serves as training and 2018 as test data. You might have more complex settings with cross-validation if you want. And also since you have the data you might provide coefficients of some more past months. Each line represents its own time series so to speak. Of course, such a setting depends on the connection you would expect from your sales coefficient. If they are influenced by external effects instead of seasonality (like in this example) you would have to add that information. And also in this example, we only have one city. Question is which of this additional information would provide any connection to the target. Eg.: if London significantly behaves different that Bristol it makes sense to include that information. Also you could substitute that info by giving the no of people living there or the no of shops that you have there. But it could be that all this is already captured within the sales coefficient.

Data preparation:

Model development:

Prediction of new data based on past information (Target= coefficient):

Of course, it would also be possible to use other time series methods that stress more the sequence of events (which here is provided by the coefficients of past moth and the number of the month). The assumption is that a typical April would behave like an April and that past coefficients ar a good forecast.

If want to try more advanced stuff you could try to adapt this workflow:

4 Likes

Hello mlauber71,

Thank you so much for your help and your patience. This is exactly the type of algorithm that i needed and sorry if i didn’t understand before but i’m a total beginner in machine learning and I didn’t work on it yet in course. I think i have all the information to suceed in my business project.

Best regards,

Adrien

3 Likes

Aaand my 50 cent would be a general comment.
You have your target column, also called dependent variable or outcome, which is sales and it’s a number. You have the rest of the columns, most of which you’ll use as inputs, also called independent variables, features or, in the case of regression, regressors.
So you have your inputs and target for some months and you trained a model. In order to apply the model to predict new month’s sales, you’ll need all your inputs available so they can be summed up to give you the predicted target.

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.