first time using R nodes in KNIME, possibly error with sapply() and converting between types

Hello,

I have a script in R that does a few data manipulation steps and runs some tests on the dataset. I’m trying to integrate it into my KNIME workflow. Tried using R Snippet node and Table to R node.

#load packages
library(tidyverse)
library(readxl)
library(data.table)
library(tableone)

knime.out = knime.in
#convert all character columns to numeric
chars = sapply(knime.in, is.character)
knime.in[ , chars] = as.data.frame(apply(knime.in[ , chars], 2, as.numeric))

#convert from numeric to nominal comparison
most = max(knime.in)
multiple = c(2:most)

for (num in multiple) {
  knime.in[knime.in == num] = 1
}

rm(chars,most,multiple,num)
knime.out <- knime.in

I have been getting the following error on the “most=max(knime.in)” line: “Error: only defined on a data frame with all numeric-alike variables.” This indicates to me that the sapply() function may not be working properly, perhaps? Both R Snippet and Table to R nodes are giving me the same error. I’m not too sure of the difference between the R nodes. Any help is appreciated. I am aware there are some KNIME video courses; if you are able to refer me to the right ones, that would also be much appreciated. Thank you.

Hi @ssheriff
I’d be more than happy to help you if you can provide at least a sample of your dataset.

But, by just looking the code I think that this is just an R problem, so, I’ve noticed that the line most = max(knime.in) isn’t referring to an individual column, but to the entire dataframe. You should only use functions like MAX or MIN if you have all numeric columns. Take a look at this test I’ve made so we can reproduce the same error:

You see there are mixed types in this dataframe and the error is the same. Maybe, you need to refer to a specific column.
image

PS: You should always test your code in your R environment to ensure it works, then import it to KNIME. That’s always helpful :ok_hand:

Cheers.

Hello, thank you for your reply! All of the code worked in R; it was only when I copied and pasted it in KNIME and replaced my df with knime.in that I was getting this error.

The code:

chars = sapply(df, is.character)
df[ , chars] = as.data.frame(apply(df[ , chars], 2, as.numeric))

was sufficient in R, but I’m not sure if it’s doing the same thing when executed in KNIME. My df had two chr columns that this code converted to numerics. Since my df should be the same as knime.in, I’m not sure why knime.in would have any columns that are neither character nor numeric…

I’ll try narrowing the scope of max() to sets of columns to see if that helps.

Yes you are right, I’ve reproduced the code in RStudio with a dataframe with a couple of string columns and the code works perfectly fine.

Still trying a couple of thins inside KNIME but it doesn’t work. It’s weird since is just a simple conversion. I’ll let you know if I found something usefull

@ssheriff
I’ve test it like this (and with that we can confirm that sapply has as strange behavior that I don’t understand yet):

You can see that I’ve managed to go from string to numeric

But, without using sapply. This code works perfectly fine (sapply is commented) referencing the columns with an actual vector.

Whenever I try to use or acces the chars matrix, I get this error:

2 Likes

@ssheriff
As a last resource you can do the conversion from string to numeric in KNIME. Like this:

And leave the snippet do further specific transformations

1 Like

This is very helpful, thank you for narrowing down the issue to sapply(). I’ll be able to use this method instead and submit my project on time. :slight_smile:

Actually, another question: how/why does your script use kIn and rOut instead of knime.in and knime.out? i was wondering if the periods might mess up code in the future

@ssheriff the example by @eamendola seems to use the community integration of R nodes.

One would have further to investigate what is going on maybe with an example that would reproduce the error and information about the R versions.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.