Math formula, Condition data for clustering

michaellis · August 10, 2018, 4:35pm

Hello All,

I am trying to condition my data sheet for cluster analysis and am coming from R and Excel transitioning to knime.

I have a full matrix of different organism and there counts. I need to add 1 to ever cell for each column then standardize the whole data sheet (logarithmicly) relative to reach other to find significant. I am struggling to understand the the math formula node. Most of the examples are only changing 1 column and I have about 40.
What would my steps be?

Thank you,

M

armingrudd · August 10, 2018, 10:19pm

Hi,
Use “Math Formula (Multi column)” node.

michaellis · August 12, 2018, 5:27pm

Hi @armingrudd

If I use the math formula for multi-column how do I compute an average deviate for each cell?
For each cell, I want to ((Xi-Xjmean)/ Xjstanddiv) j being the whole column of values? screenshot posted below

What I am trying to do is normalize the data to relative abundances logarithmically

M

armingrudd · August 12, 2018, 6:50pm

CURRENT_COLUMN (in the column list) is what you missed. Use it as Xi. And you can calculate the column average by using COL_MEAN() as you did and using CURRENT_COLUMN as column name. And the same for COL_STDDEV().

michaellis · August 12, 2018, 7:06pm

@armingrudd
I get an error when replacing the Rowcount with CurrentColumn

Is this going to format ever cell in my table from column to column?

armingrudd · August 12, 2018, 9:10pm

Use parentheses around parts of your expression if you want it to be calculated the same as the formula you provided: (Xi-Xjmean)/ Xjstanddiv
($$CURRENT_COLUMN$$ - COL_MEAN($$CURRENT_COLUMN$$)) / COL_STDDEV($$CURRENT_COLUMN$$)
Use this expression and let me know if the error occurs again.

michaellis · August 12, 2018, 9:28pm

Thank you @armingrudd
It seems to work well simmilar to (R table to Table R ) configuration. I noticed that if I don not have the Replace Selected Columns checked, it wont Output the formatted data. Why is that?

Best,

M

armingrudd · August 12, 2018, 9:46pm

I cannot understand what you mean by formatted data. Would you please explain it?

michaellis · August 12, 2018, 9:58pm

@armingrudd
Apologizes, formatted meaning : the math formula changed the data to a the new data spread sheet.

armingrudd · August 12, 2018, 10:10pm

Still I’m not sure if I got your point perfectly, but the difference between the 2 options is that the first option (append) adds new columns for the calculations and keeps the original columns as well. On the other hand, the second option (replace) removes the original values from existing columns and overwrites the new values.

michaellis · August 13, 2018, 2:52pm

Hi @armingrudd
Thank you that makes it very clear.

Now I am trying to do another data transformation with the data table prior to the standard diviate you showed me above. I want to double relativization with

bij = Xij - (columnSigmaXij * rocSigmaXij)/ (ColumnSigmaRowSigmaXiJ)

I am not sure how to sum the rows in the data sheet outside the column

Picture Below

Does this make sense?

Thank you! I’ve never done this before

Michael

armingrudd · August 15, 2018, 8:51am

Use “Column Aggregator” node to calculate sum of rows. Then again in “Math Formula (Multi Column)” node, calculate the formula.

armingrudd · August 15, 2018, 9:13am

Here is the workflow:
math formula.knwf (19.2 KB)
Dear @Iris, when I was creating the workflow to calculate the formula, I noticed that I cannot calculate ΣΣXij (which is the sum of all values in the table) in a single formula with other calculations (Check second math formula node in my workflow - you cannot use “COL_SUM($Row Sum$)” at the end instead of $Row Sum_suffix$ which I had to calculate in the previous math formula node). Why is that?

Iris · August 15, 2018, 1:48pm

Hi Armin,

yes you are right. I just send this to our developers and will let you know when it is fixed.

Best, Iris