I would like to build a workflow of supervised learning on Knime, but i still don’t see the difference between a supervised learning model and a simple algorithm with multiple of “if”. Let me explain to you by an example on binary classifier :

If you have data on the weather like temperature, humidity and windy. The two class are YES i play outside or NO i don’t play outside.
You give the rule on a sample for your data training and then you apply it on the data set.

How is it different from an algorithm that is:
If(Cloudy=no)
if(windy=no)
if(temperature >15°)
then YES
else NO

I think it’s easy to see the output of a machine learning model and say, “oh that’s just a series of if statements”. The key in understanding is knowing how the machine learning model came up with those results. Typically involving some form of mathematical minimization or maximization to get the optimum results. (E.g. how do you know that if(temperature > 15) = YES, is the correct number?)

Additionally, the if statements work well in your example because the only variables are categorical. However, if statements don’t work so nicely when you’re trying to calculate an odds ratio, or use continuous variables.

Hope that helps… Feel free to ask clarifying questions!

I’m studying applied math so i see exactly what you said about the fact that machine learning is kind of optimize a function. But for the supervised learning model, the data scientist has to give the basic rules for the model. In the example, temperature over 15 is in the basic rules in the data training. so what i understand is that i give rules to the model and it reproduces the rules on the data set and i don’t see the difference with if statement.

For a supervised learning model, the data scientist does not give the rules. The data scientist only provides the dependent output, e.g. did the person go outside or not. (Typically denoted as a 1 or 0.) The supervised model then takes all other information (e.g. the exact temperature, the exact wind speed, whether or not it was raining, etc.) to create the “rules” or “breakpoints” you see in the output, through mathematical optimization.

To help clarify further, the “supervised” part of a supervised model, simply means the user provides the dependent variable. (In this model, whether or not the user went outside.) There is such thing as an “unsupervised” model, where the user simply provides the data, and the computer figures everything else out. (A lot of clustering algorithms are unsupervised.)