looped calculations using values of a column after subsetting by different factors (categorial)

#1

I am new in KNIME and I have only very basic programming knowledge. I try to evaluate whether KNIME might be useful as a complementary tool to R and Excel.

I have to calculate values by arithmetic operations from large data sets that are hierarchally structured by different categories (being factors in R) and I need to do some statistics afterwards.

From top to bottom
Level1 categories (e.g. 3) each split into 100 categories (level2) each of which has 10 subcategories (level3) each of which has 2 subcategories (level4). The numerical values have very precise coordinates by this structuring. Unfortunately, I need not only statistical evaluation which is rather simple but I have to do arithmetric calculations with some of the numbers (different ways of normalization, addition of fived numbers etc) in this conditional framework.

I can subset the whole dataset according to the factor levels into many pieces and finally transform the column into several columns (long to wide table) which the easily enables arithmetric operations. However, I need to do this splitting and reforming of the tables by using loops. I am not very good in loop construction and I noticed that the whole procedure is rather slow and error prone.

Does anybody know a better way to fulfil this kind of task in KNIME. I tried to do this in R and found it very demanding as I am not an expert in R syntax and functions. Moreover, one can loose control whether all subset are dealt with in the required way.

1 Like

#2

Hi @kguehrs and welcome to the forum.

Based on your description, this sounds like a job for some combination of the following nodes:

  • Group Loop Start
  • Statistics
  • Math Formula (or if conditional logic is needed, perhaps Column Expressions)
  • Normalizer
  • Loop End

Here’s a sample workflow from the Hub that demonstrates some of these nodes in practice - in particular, the looping function.

If you can provide a dummy dataset and some sample calculations you’d like to perform, perhaps someone could help come up with a workflow more tailored to your needs.

1 Like

#3

Hallo Scott,

Thank you for your kind response.

I already tried some workflow with KNIME but as I mentioned I struggle with the links between the different loops to finally come up with a single table that contains all the data in one or more additional columns depending on the type of calculations that are required for different tasks.

I give you some dummy table demonstrating the structure of my data sets. I need to perform arithmetric operations with the data in col5. These operations also include things as division of two values from the same col5, addition of different integers to values of col5 etc.

I think that this is a rather difficult matter and I do not expect tailor-made solutions but some ideas how to hanle this kinds of tasks somewhat appropriate.

dummyTable_01.xlsx (9.1 KB)

0 Likes

#4

Hi @kguehrs

With this dummy_Table_01 as an input. What will the output look like? Do you have an example for the output as well? For me the info is to much “high level” to create a example workflow.

gr. Hans

0 Likes

#5

Hello Hans,

I am sorry that I did not give not enough information to explain the matter in a suited way.

As I mentioned I need to end with one table or x tables (x=number of Level1 categories) which I can use to do some statistical evaluation over all (or at least the datasets without too many missing values) datasets.

As requested I add another dummy table with some examples of arithmetric operations that might be applied. Of course, except of ratio calculation in col6 not all the operations are required in each case but for some data 1, 2, or 3 additional columns with results of arithmetric operations (ratio + ratio after addition of integer etc.) is neccesary to correctly address the problem of interest.

dummyTable_02.xlsx (9.8 KB)

0 Likes