I am new in KNIME and I have only very basic programming knowledge. I try to evaluate whether KNIME might be useful as a complementary tool to R and Excel.
I have to calculate values by arithmetic operations from large data sets that are hierarchally structured by different categories (being factors in R) and I need to do some statistics afterwards.
From top to bottom
Level1 categories (e.g. 3) each split into 100 categories (level2) each of which has 10 subcategories (level3) each of which has 2 subcategories (level4). The numerical values have very precise coordinates by this structuring. Unfortunately, I need not only statistical evaluation which is rather simple but I have to do arithmetric calculations with some of the numbers (different ways of normalization, addition of fived numbers etc) in this conditional framework.
I can subset the whole dataset according to the factor levels into many pieces and finally transform the column into several columns (long to wide table) which the easily enables arithmetric operations. However, I need to do this splitting and reforming of the tables by using loops. I am not very good in loop construction and I noticed that the whole procedure is rather slow and error prone.
Does anybody know a better way to fulfil this kind of task in KNIME. I tried to do this in R and found it very demanding as I am not an expert in R syntax and functions. Moreover, one can loose control whether all subset are dealt with in the required way.
Based on your description, this sounds like a job for some combination of the following nodes:
Group Loop Start
Statistics
Math Formula (or if conditional logic is needed, perhaps Column Expressions)
Normalizer
Loop End
Here’s a sample workflow from the Hub that demonstrates some of these nodes in practice - in particular, the looping function.
If you can provide a dummy dataset and some sample calculations you’d like to perform, perhaps someone could help come up with a workflow more tailored to your needs.
I already tried some workflow with KNIME but as I mentioned I struggle with the links between the different loops to finally come up with a single table that contains all the data in one or more additional columns depending on the type of calculations that are required for different tasks.
I give you some dummy table demonstrating the structure of my data sets. I need to perform arithmetric operations with the data in col5. These operations also include things as division of two values from the same col5, addition of different integers to values of col5 etc.
I think that this is a rather difficult matter and I do not expect tailor-made solutions but some ideas how to hanle this kinds of tasks somewhat appropriate.
With this dummy_Table_01 as an input. What will the output look like? Do you have an example for the output as well? For me the info is to much “high level” to create a example workflow.
I am sorry that I did not give not enough information to explain the matter in a suited way.
As I mentioned I need to end with one table or x tables (x=number of Level1 categories) which I can use to do some statistical evaluation over all (or at least the datasets without too many missing values) datasets.
As requested I add another dummy table with some examples of arithmetric operations that might be applied. Of course, except of ratio calculation in col6 not all the operations are required in each case but for some data 1, 2, or 3 additional columns with results of arithmetric operations (ratio + ratio after addition of integer etc.) is neccesary to correctly address the problem of interest.
Did you managed to make some progress with your calculations? I have checked your data and it is possible to get each of these columns in KNIME (col6-col10) but for each one different logic is needed. I get help you with certain column calculation if needed.
No I did not make any progress in knime so far. I concentrated to approach the analyses with R as I am somewhat more experienced with this. However, the split-apply-combine approach is rather difficult when dealing with arithmetric operations of individual elements from one column of lists or dataframes. The nested syntax of R that is required for subsetting and addressing the needed elements is rather hard to understand and time-consuming when testing all combinations. That was the reason to look into knime because operations are better described and with the modular setup much easier to test.
The most critical for my current project is the calculations of the ratios that are needed to populate column 6. As the datasets are pretty large and contain many different combinations of columns 2 and 3 the whole matter explodes in many tiny tables when subsetting using the categories/factors of columns 2 and 3. Then it is cumbersome to keep track and not to miss to calculate all the required ratios and reformat the tables in the correct coposition.
So any support would be highly welcome. I guess that I can solve the calculations of the columns 7 to 10 myself when somebody can show me how to setup this kind of workflow. At least I would be interested to try.
Best
kguehrs
Ok. Then I can help you calculate col6 and hopefully it will help you take care of other calculations. First you can use Moving Aggregation node to create new column that will copy the next value of col5 to upper row. After that simply apply Math Formula dividing two columns. This will give you ratio for every row and if I got you right you only need it for every second row so use Rule Engine node to have missings in rows you do not need ratio. Check this workflow and if any questions feel free to ask. Every other column you can get on a similar way
Thank you very much for your help. I will try to figure out how the workflow is set up and then try to apply it to my data. I will come back to you when I succeded or when I have further questions.