Extracting variables(columns) from a table with a mathematical condition

Hello I am new to KNIME. I am trying to bootstrap my dataset several times in order to create an Ensemble Classifier. What I need to do for each bootstrapped table is extracting randomly a fixed number of columns with a condition, independently from the content of the columns. In particular, I would like to use a percentage(let’s say 30 % per table) of the number of variables(columns) present overall in this case. Do you know of any configurable node or pipeline I may use for this purpose or do you think it would be better to integrate a python node with a script? Thanks in advance.

Hi @matt0 and welcome to the forum.

If I’m understanding you correctly, the Tree Ensemble Learner node does this as part of its configuration, as shown below:

Or am I missing some nuance of your question?

1 Like

Thanks @ScottF. I actually forgot about an important part. What I am trying to do is actually to create an Ensemble classifier composed of neural networks though instead of decision trees. So I am trying to find a way to split the entire dataset into several bootstrapped datasets made up of a subset of the variables present in the original one according to a certain criterion(in this case a precentage even though it may be some other operation). Then NN models will be trained on those datasets and combined through majority voting in the end.

Hi @matt0

This workflow sample columns.knwf (864.8 KB) takes multiple random samples of 30% of the columns from the Boston Housing dataset and calculates the performance of the models.


gr. Hans

4 Likes

Thanks @HansS. This workflow might be a good template to start with. I will try to implement it for the problem of classification and with MLPs.

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.