Looping over a set of tables and/or nodes

Hi,

I have a set of different data tables and want to apply them to several different classification/regression algorithms. At an abstract level, I'm thinking of a workflow with an outer loop iterating over all data tables and an inner loop iterating over all learning algorithms (represented as (Meta-) nodes). However, I could neither figure out how to loop over a set of tables, nor how to loop over a set of nodes.

Currently, I implemented the workflow without loops and instead copy&paste everything for each individual table/algorithm combination. But as you can imagin, having a large number of such combinations makes it very tedious to introduce small modifications in the workflow as they have to be repeated in all copies.

Does anybody know of a way to realize this within a loop structure?

In my particular case, I try to set up a combi-QSAR workflow. As input I read a set of chemical structures from a file (smiles or sdf). Then I want to to calculate different sets of descriptors (RDKit, CDK, Indigo, ...) and use several classification/regression algorithms to build the models (kNN, naive Bayes, SVM, ...). The goal is to build a model from each possible combination of a descriptor and a learning algorithm, compare the performance of the models and select the best ones to run them on an external data set.

Thanks for any advice!
Andreas

Hi Andreas, 

Your approach sounds close, but you need to think about how your data looks to start with.  Is it in a number of different files, or is it already condensed into a single master table?  If the former, you can try the node sequence {List Files > TableRow to Variable Loop Start > (A branch for each model) > Loop End (collecting model accuracies)}. If the later, then you do the same, but likely use a group loop start in order to sort out your initial tables. 

Regards,

Aaron

Hi Aaron,

Thanks for your help!

I start from a single file containing only the compound structures and the target property to build a model for. After parsing the structures, I currently branch the workflow to calculate the different descriptor sets available within KNIME. So I end up with one table for each set of descriptors.

I could of course join them into one master table. But if I got it right, the group loop start considers another subset of rows in each iteration, based on one or more columns that define the grouping. I would need it the other way round, namely looping over all rows while considering only a subset of columns (based on the column names) such that only columns belonging to the same descriptor set are included in the same iteration. Is there a way of doing this, too?

May be I could transpose the table before looping, but than I had to transpose it back for the model building process, and probably transpose it a third time before closing the loop. Actually, I don't think this would be a good way to handle it, no?

Hi choc, 

Is it reasonable to calculate the properties consecutively rather than in branches?  Either way, putting all of the desriptors in a single table may be the best path forward. Are you trying to see which features are most predictive?  In this case, looking at the tree ensemble nodes (KNIME Labs) might be interesting, as you can use them to sample your featurespace many times and then use the collective results to score your models.  I find this approach quite useful. 

 

 

 

Hi,

I need to change the values of each cell in a table.

If the cell value is greater than 0.4, replace the value with 1, else replace the value with 0.

How do I do that?

Cheers,

Ken

It's fine now. Managed to achieve it using the nodes "column list loop start" and "math formula".

Hi Shevken

sorry we missed your question. It is mostly better to open a new thread, than we see there is something new to be answered :)

Best, Iris

1 Like