I'm relatively new to Knime. I wish to inquire about a specific process and how to generate a knime workflow to execute this process.
I have a dataset with certain attributes. I want to generate classifiers and related data (ROC curve, AUROC, precision, etc.) for the entire dataset, then remove one attribute and generate another classifier, and then remove a different attribute from the original and generate another classifier with relevent information, etc.
Is there a specific node that can help me do this or do I have to use a variety of nodes to obtain this workflow? If it's the latter, can somebody talk me through the process?
Welcome to the forums!
Have you seen our public examples yet? There is at least one there that may be interesting to you. It is an example of feature elimination, which can be used to find the most interesting attributes of your data. The workfllow is named 002004_Feature-Elimination_with_Naive_Bayes. More information about our example workflows can be found at the link below.
As KNIME desktop is open source, it is (mostly) community supported so there isn't anyone specifically that will be availible to provide a lot of one on one support but we do have some great community members that often chime in on discussions like this when their expertise and time permit. In terms of getting up to speed, I would also reccomend the KNIME youtube channel as well as our training courses which we periodically offer at our offices in Zurich. On the off chance that you happen to be in the Boston area, we are having a life sciences day on Friday in Cambridge that also would probably be interesting.
Again, welcome and happy KNIMEing!
I would agree, the feature elimination metanode is what you want here. This does as you describe in removing one different attribute at a time, after all attributes have been removed once, it then permanently removes the weakest correlated attribute and starts removing another different attribute at a time. This continues until you are left with just one.
Thank you Aaron and Richard.
However, the dataset I'm working with has almost 2000 features, which would require about 150 processing days at my computer's speed.
Is there a way to perform what Richard describes, removing one attribute at a time, without going back and permanently removing the weakest attribute? That way, I can manually determine what the weakest and strongest attribute is for myself?
Hi what you need to do is generate a list of the column names by taking the column headers, and transposing the table. These column names are then put into the RowID column, and using the Chunk Loop Start node, one is taken at a time in turn. This is then transposed back to give the column name (but just one column name at a time), and this is passed through the Reference Column Filter which removes this column from the main dataset. Then you use the node you require such as the ROC Curve node and finish with a Loop End which will collate the results from each iteration.
Attached is an example picture of the workflow.
Note at the first transpose node, you may wish to filter out the column names that you wish to appear in every iteration (i.e. the columns you do not want to be removed at any point).
I am new in knime. I have downloaded one package form knime and later I could not find it.
Any help will be highly appreciated.
Not sure how much it helps, but I have questions: What was the purpose of the package? Can you download and install it again?