my project to graduate

@AlexanderFillbrunn
I hope you understand my problem. it’s actually very simple but I can’t. I want to see the real values ​​that the program predicts based on the percentage given. I want to see an estimate of the data in the Uretim_toplam table.

Hi,
let’s keep this in the forum and I would not share my phone number publicly if I were you, this is why I removed it from your post. The MLP node is a classification node, but you try to predict a double value. For this you need a regression node. I have replaced it with a Random Forest (Regression) setup and removed the normalization, because you do not need it here. This solves your problem here.
Kind regards,
Alexander

16052021.knwf (18.6 KB)

3 Likes

04.01.2020.xlsx (363.8 KB) 16052021.knwf (18.6 KB) I think the problem is solved but the values ​​are not close to each other. How can I increase the success percentage? I want to bring the values ​​closer together. How can I make progress? I have a lot of debt on you, which I do know, please, thank you very much.

@AlexanderFillbrunn
thank you for your endless support.

Hi,
This is unfortunately a complex process to find out. You can try out different regression algorithms, engineer more features, tweak the hyperparameters like the number of trees in the random forest, etc. But that requires quite some time.
Kind regards,
Alexander

1 Like

You could try and use H2O.ai’s automated-machine-learning for regression problems (or a H2O node from KNIME’s repository) and see if this comes up with better results or does give you an idea where to look further. Since this seems to be an assignment you would have to check what is your contribution to your work (that would depend on your original task).

Another typically powerful tool would be the regression nodes from:

5 Likes

I have been able to do this until now, but can you support me to correct these forecasting results incorrectly?

WORKFLOW

Test1.knwf 1.knwf (651.3 KB)
Hello friends,

I did a nice workflow, but I have a very high error rate. I want to see the Uretim_Toplam and Total and Prediction values ​​close together. Is there anyone who can show or correct mistakes? This knime project is my finishing project.

@HansS @AlexanderFillbrunn @AJABLR @lug @ipazin @datascience100 @stelfrich @rabenschlag @elsamuel

Test1.knwf 1.knwf (651.3 KB)
Hello friends,

I did a nice workflow, but I have a very high error rate. I want to see the Uretim_Toplam and Total and Prediction values ​​close together. Is there anyone who can show or correct mistakes? This knime project is my finishing project.

@sgilmour @izaychik63 @mlauber71 @Rich_ard @armingrudd @ScottF

@umutcankurt hocammmm türk buldum nolur yardımcı oluver

Dear @yesiloglua,

As @AlexanderFillbrunn has already pointed out, we are happy to help you with specific questions that you might have with respect to KNIME Analytics Platform. I would assume that improving the accuracy (or any other metric) of your predictor is at the center of your task/assignment and has nothing to do with a specific implementation in any software.

Please refrain from mentioning specific people who have not been involved in this or any other discussion around this topic. Thank you!

Best regards,
Stefan

4 Likes

Hi; @stelfrich For me, the problem is no problem, just because we were in the same country, she only asked for help.

Hi; @yesiloglua

friends in the forum are very knowledgeable people. And be sure they are people who are happy to help everyone. I think you have very little time to graduate, but I never worked on the subject you mentioned, and I worked on data mining. For this reason, if you write the problem you are experiencing in a detailed and exemplary way, they will definitely help you. I’m sorry I couldn’t help you with this.

1 Like

My forecast values ​​in the yellow area are very close and the success and error rate in the red area is good.

The error rate in the blue area is incredibly high. The predictive values ​​are good and the score is very bad. I want to use or close to the error rate in the red area.

Frankly, I want to reduce the error rate in my program. 2 weeks left to deliver and I still couldn’t. So I tagged everyone to ask for urgent help. I’m sorry.

Maybe you could try and take 15 minutes to describe your task and what your specific question is.

BTW: the values Uretim_Toplam and Cekilen_Toplam seem to be very closely related. Question is if this is OK if you try to predict one following the other. Cekilen_Toplam takes abot 1/3 of the ‘explaining’ power if you use H2O models.

What is the metric you are trying to maximise, In regression questions often RMSE is used. Can you tell us which is your metric. And is this the only metric you have.

And you have very few values that are very high and a lot of others that are lower. You use normalization. If you have such large discrepancies you might have to use Z-Score transformation or log() (be careful how to handel 0 values).

4 Likes

@mlauber71

Do you have a chance to help me? Seriously, I don’t master the program and I can’t do more, that is, what you say. I searched but did not.

Is there anyone who can help? my only hope is this forum

My forecast values ​​in the yellow area are very close and the success and error rate in the red area is good.

The error rate in the blue area is incredibly high. The predictive values ​​are good and the score is very bad. I want to use or close to the error rate in the red area.

Frankly, I want to reduce the error rate in my program. 2 weeks left to deliver and I still couldn’t.

Test1.knwf 1.knwf (651.3 KB) 04.01.2020.xlsx (283.7 KB)

Hi,
your error rates in the red square are only good because the data has been normalized before learning and predicting. The R^2 metric, which shows the explained variance in your data, is exactly the same. If you want lower errors, you need to find a good model type and good features. Other people spend 80% of their project on that alone, so I am not sure anyone here can help you with that, because they do not know anything about your project. Maybe it is just not possible to infer a better model from the data you have. There is noise and other factors that may play a role.
Kind regards,
Alexander

4 Likes

Like @AlexanderFillbrunn said it is because of the normalisation and especially the structure of the two values

Uretim_Toplam
Cekilen_Toplam

They have a lot of very low numbers and some are very high. And they are highly correlated (Pearson 0.94) but their connection is not immediately clear so it might be challenging to actually derive one from the other (one would have to know more about what they say).

As mentioned before if you use H2O automl and take a look at the variable importance Cekilen_Toplam takes a lot of the ‘explaining’ power of the whole model and the RMSE is along the numbers you are seeing in your MLP model. Some tweaking might improve it further.

You would have to think about what you want to do with these numbers and think about how they come about and what their relationship is. There are several things you could explore but that might depend on your task:

  • try to use logarithm on the data in order to smooth the curve, but that is not easy to handle if you want to get precise data back
  • exclude the high numbers and think about doing two models for high and low values (if you would deploy such a model you would then have to decide which to use when)
  • further tweaking of the numbers and employing feature engineering methods (vtreat, featuretools) - but as long as it is not clear what is going on with the Target and Cekilen_Toplam this might also be misleading
  • why have you excluded some other variables from the model? Might they hold some information that might be helpful?
  • in Partitioning you choose linear sampling. Was this a deliberate decision?

I do not think that there is a purely ‘automatic’ model voodoo solution to your problem. If you would tell us more about your project and your assignment we might come up with new ideas.

Also you might want to consider if just copying something from the internet will be sufficient for your task or you would then have to explain what you were doing and answering specific questions.

Building a model in two weeks is challenging but it is also doable. I think you will have to come up with an idea what a ‘minimum viable product’ might be. From my experience it might sometimes be better to ask for an extension then to try to just patch something together.

7 Likes

For example, I want to set limits for success rate and error rate. It will repeat until it reaches the limit. Can there be a function that will generate random value within the Learner section?

For example, I want to set limits for success rate and error rate. It will repeat until it reaches the limit. Can there be a function that will generate random value within the Learner section?

Maybe the parameter optimization loop is what you are looking for?
You can use it to test many different combinatons of your ML parameters and later take the one with the best results

1 Like