Understanding C09 / S2

Hello @Brain and sorry for my late answer,

Regarding my proposed solution for the JKI S02 - CH09 ; thank you to take a look into it. As stated in the post, I’m not an expert in ML, neither in wine. So the answer is not about the results itself, but about what the data is telling us.

I just tried to create my own vision of the challenge that differs totally from the official; that I recommend because KNIME team really knows:

OFFICIAL [ Auto ML (Regression) ]:

Trying to answer specific questions, I am not sure if you look for a dedicated wine culture analysis, or data related:

This means that for the specific processing [multiple linear regression backward elimination, without significance level restriction], the effect of negative correlation value returns a negligible coefficient of determination. That is why I negated independent variables with negative coefficient.

The reason of this solution is forced by the challenge statement " your goal is to see which features are the most important in predicting the quality of wine"; usually regression algorithms are built with a significance restriction by coefficient of determination.

If I run the same process with the data AS IS , the result becomes as in the following picture. High negative correlation values, despite expected relevance; they won’t become considered…

Google search: sulfites in wine meaning

Correlation doesn’t implies causality… then I wouldn’t say “is associated”, but “increased quality highly correlates with sulfides”; the answer probably is YES. Aiming to start this discussion, I would run again the exercise taking ‘sulphates’ as the dependent variable :thinking:

Said so… I wouldn’t try to self satiate thirst with expensive wine.

BR

2 Likes