Feature Selection and new calculated feature by given features

Hi, I'm Valentina from Italy and I'm new to Knime.

I'm an engineering student and this is the first time I approach to data mining!!

 

I have a classification problem: starting from features of earthquakes, I have to determine if the earthquake is dangerous or not. The nominal (target) class is binary: "YES" if the magnitude is equal or greater than 5 (Richter), "NO" otherwise. Obviously I can not use the Magnitude feature!
 

These are my questions:
1) I have to add a feature to my dataset, calculated as the square root of the sum of two features squared (application of Pythagoras' theorem): is there a node in Knime to perform it?
2) About AttributeSelectedClassifier node (in Weka plugin), can you suggest me some consistent configuration different from the default, so I can try like examples?
3) How can I use R plug-in to perform the feature selection?
 

Thank you so much!
Best regards,
Valentina

For point 1. You can simply use the Misc/Math Formula node to do this.

If you want to use the Magnitude feature instead of YES or NO, then try out the Linear Regression nodes in the Statistics directory.

 

Simon.

Point 2) You need to have at least one nominal class column and a number of numeric columns, those need to be compatible with the base learner that you have assigned within the configuration.

Point 3) In the KNIME Node Repository the category Meta contains a pre-configured meta node Feature Elimination. If you open it and look at the contained nodes, you see a Learner and Predictor those can be replaced by the R Learner and R Predictor, resp.

Might be worth balancing your data set with a more equal set of YES and NO. You can do this by using the equal sample size node. You’ll be left with a data set of 700 each of yes and no.

Simon.

Math Formula node is OK and easy to use, I downloaded it from KNIME Extensions.
 

4) Another question very important for my studies: the kind of learner and predictor nodes must be chosen in accord with feature selection algorithm used? For example, if I use in the configuration of Feature Selection node a J48 classifier, then I have to use a Tree learner and predictor? Or if I use in the configuration of Feature Selection node a MultilaterPerceptron classifier, then I have to use a MPL learner and predictor?
Please suggest me some examples of combinations of feature selection algorithms with learner and predictor nodes related, so I can clarify my ideas!


5) In my earthquake problem, I have only 700 rows in nominal class YES and 9000 rows in nominal class NO, so in learning phase I think it is not adequate using Cross Validation with K-fold? I think It's more critical (and important to classify correctly) properly configuring the Partitioning node (for example I set Stratified sampling). It's true?
 

Thank you for your kindness!!

.