First: You split the dataframe based on those columns that have a different value for the substring that goes until the first “_” character.
(In the case of our toy dataset this would divide the dataframe in two: variable1 and variable2)
Second: You would substract the values of all the columns that do not have “baseline” in it’s name against the one that has it.
As I said, I can do this easily in R, but I’ve been trying to find a way of doing it in KNIME and honestly I didn’t find it.
Hi @berserkersap , that’s exactly it.
I’ve never used the flow variable node, I suppose it will allow me to extract a dataframe for each of the variables?
Si I would have one dataframe with Variable1_Day1_Rawm Variable1_Baseline_Raw compute the difference with the math expressions node; and then a loop will allow me to the same for the rest of the variables.
Thanks for your answer
Based on your comment, what you show us is a nested hierarchy in your column header name. You need to transform it in the way that time-series ($Day$) arrange in a column, then: $Variable_num$ (character), $Baseline$ (value) and $Raw$ (value).
As all your values are numeric you won’t find many hurdles to do it. Maybe unpivot separately Baseline and Raw; then work the string $Column_Names$ in the two separated tables and a final Joiner with double matching criterion ($Day$, $Vasriable_num$)
You can check this post for insights:
The final Math Formula will be just the difference of two columns. You can later rearrange back the data as your preference.
Thanks @gonhaddock for your answer. But I don’t think that fits my data, or at least I don’t see the way to do it. Baseline and raw don’t belong to the same category.
I can separate my dataframe in several dataframes, one for each variable, using the column splitter node. But what I would like is an automatized way of doing that, so that automatically, based on some gsub command (different strings until the first “_” value) separates into X number of dataframes, where X is the number of different variables that I may have.
This problem sounds easy but it is challenging because there can be many “Day” columns for the same Baseline and hence subtraction becomes complicated. Thus I believe it cannot be done without a loop in KNIME. If there were only one “Day” column per “Baseline” column, then it could have been done without the need of a loop.
Please find below a possible solution to your question:
And just for the records as @aworker already provided the solution. As I was pointing in my previous post, your only data complexity resides in the arrangement of your data; it is only a transformation issue, then displaying it in two columns by un-nesting the timeseries $Days$ simplifies the difference calculation: