Hi,
I have a large number of numerical columns which have a large skew. Before performing an analysis, I wish to transform these skewed columns via a log transform.
Here is an equivalent R code for what I'm looking for:
for (nm in names(train)[2:(length(names(train)) - 1)]) {
sk <- abs(skewness(train[, nm]))
if (sk > 5 & minval >= 0) {
train[, nm] <- log1p(train[, nm])
}
}
Using Knime, I can grab all the skewed columns using the statistics node, however I only seem to be able to apply a math function one column at a time using the Math formula node inside a column list loop (with the column regex rename trick).
For large numbers of columns (anything over 20 really) this takes a long time - several minutes up to hours when you get to hundreds of columns. The R code takes seconds.
Is there a quicker more efficient Knime way?