Hi!
I have a theoric question:
I need to predict the purchase for each country. I have a dataset with past data with this columns:
customer id
purchase data
country of purchasing
type of product purchased
Which is the best model to implement in Knime?
I have tried with random forest but when I set the random forest learner on the target column = country I have an error:
Execute failed: The target column does not have possible values assigned. Most likely it has too many different distinct values (learning an ID column?) Fix it by preprocessing the table using a “Domain Calculator”.
What is it?
Random forest is not the best model for what I am going to predict?
Many thanks
@giad this does not sound like you have very much data in order to do a prediction. In general if you want to predict the amount of purchases (money, volume etc.) this would be a regression problem, but I am not sure you have set up your problem yet in a sufficient way.
If you want to read about (regression) models I have compiled a collection and also there are several resources to educate yourself about machine learning.
from what I see error message you got suggest to use Domain Calculator node as you probably don’t have domain calculated for Country column. But still this doesn’t seem a way to go in case you are predicting a sales per country. As @mlauber71 said you have a regression problem and thus can’t really use classification algorithm (Random Forest) for it.
Regarding clustering there are DBSCAN, Hierarchical Clustering, k-Means and other… Typing clustering to Node Repository will give you more nodes while on KNIME Hub you can see workflow examples.
as @ipazin said KNIME offers serveral such algorithms. Some generic ones; you might want to take a look at this example to see what this concept of clustering is about:
But I would strongly advise to think about waht kind of problem you want to solve and familiarize yourself with the concepts behind several machine learning techniques.