Hello I am new to KNIME. I am trying to bootstrap my dataset several times in order to create an Ensemble Classifier. What I need to do for each bootstrapped table is extracting randomly a fixed number of columns with a condition, independently from the content of the columns. In particular, I would like to use a percentage(let’s say 30 % per table) of the number of variables(columns) present overall in this case. Do you know of any configurable node or pipeline I may use for this purpose or do you think it would be better to integrate a python node with a script? Thanks in advance.
Hi @matt0 and welcome to the forum.
If I’m understanding you correctly, the Tree Ensemble Learner node does this as part of its configuration, as shown below:
Or am I missing some nuance of your question?
Thanks @ScottF. I actually forgot about an important part. What I am trying to do is actually to create an Ensemble classifier composed of neural networks though instead of decision trees. So I am trying to find a way to split the entire dataset into several bootstrapped datasets made up of a subset of the variables present in the original one according to a certain criterion(in this case a precentage even though it may be some other operation). Then NN models will be trained on those datasets and combined through majority voting in the end.
This workflow sample columns.knwf (864.8 KB) takes multiple random samples of 30% of the columns from the Boston Housing dataset and calculates the performance of the models.
Thanks @HansS. This workflow might be a good template to start with. I will try to implement it for the problem of classification and with MLPs.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.