Hi,
I have an Excel sheet with 30.000 rows and different columns. One of these columns consists out of different numbers and my goal is to understand which criteria from the remaining columns influences the number. Does anyone has an idea how I can do this with KNIME? In the end I want to know when I get a specific number which of the other columns I need to fill out with which information so that I get that number.
Hi @Jinni,
I think your use case is actually a supervised learning problem. Your numbers are the labels (dependent variable) and the other columns the features (independent variable). Do the numbers signify a nominal value, i.e. some kind of category, or are they continuous numbers? In the former case, you can use a classification algorithm, in the second a regression algorithm. Maybe you can use a Decision Tree or a Regression Tree to train how to infer the number from the other columns. That way you can inspect the tree structure of your model to see which rules influence your numbers.
The nodes you would need for this are:
- Simple Regression Tree Learner or Decision Tree Learner to train the model
- Corresponding Simple Regression Tree Predictor and Decision Tree Predictor to test the model on the test data
- Scorer or Numeric Scorer to evaluate the model quality
I hope this helps!
Alexander
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.