splitting strings into letters using R snippet node

Hi, I am using the R snippet node to take a column of strings, split them into their component letters, then output the results as a dataframe.
The code I have places is:

tmp ← knime.in # a simple table consisting of 1 column (“Column Header) of 16 strings #
df1 ← as.data.frame(tmp)
df1$‘splt’ ← c(strsplit(df1$Column Header,split=”")) #adding the split strings as a second column in the df #
knime.out ← df1

This seems to work in R studio, which gives me the output:

df1
Column Header splt
1 Int_WC2hA1 I, n, t, _, W, C, 2, h, A, 1
2 Int_WC2hA2 I, n, t, _, W, C, 2, h, A, 2
3 Int_WC2hA3 I, n, t, _, W, C, 2, h, A, 3
4 Int_WC2hA4 I, n, t, _, W, C, 2, h, A, 4
5 Int_WD2hA1 I, n, t, _, W, D, 2, h, A, 1
6 Int_WD2hA2 I, n, t, _, W, D, 2, h, A, 2
7 Int_WD2hA3 I, n, t, _, W, D, 2, h, A, 3
8 Int_WD2hA4 I, n, t, _, W, D, 2, h, A, 4
9 Int_WC2hT1 I, n, t, _, W, C, 2, h, T, 1
10 Int_WC2hT2 I, n, t, _, W, C, 2, h, T, 2
11 Int_WC2hT3 I, n, t, _, W, C, 2, h, T, 3
12 Int_WC2hT4 I, n, t, _, W, C, 2, h, T, 4
13 Int_WD2hT1 I, n, t, _, W, D, 2, h, T, 1
14 Int_WD2hT2 I, n, t, _, W, D, 2, h, T, 2
15 Int_WD2hT3 I, n, t, _, W, D, 2, h, T, 3
16 Int_WD2hT4 I, n, t, _, W, D, 2, h, T, 4

However, the R snippet node fails when I run it.

Perhaps I am missing something basic, but does anyone have any ideas of why the node fails?

Hi @0nly4phil . I don’t use R but I’m guessing that the question you might be asked by somebody who does is what error message you get when the node “fails”. If you cannot see the error message, hover the mouse overthe red error symbol on the node. A screengrab may suffice. Alternatively, switch to “Classic UI” and open the Console. If you weren’t aware of how to find the error message, it is possible that this may help you answer your question. :wink:

One other thing is that when uploading code samples, it is better if you mark it as ‘preformatted text’, so that then we can see exactly what code you have used, rather than the modified version of it that is displayed by the forum software. You can do that by highlighting the text and press the “preformatted text” button.

image
thanks

2 Likes

@0nly4phil have you made sure that df1 also is a data.frame?

Hi and thanks. The error message was:

ERROR R Snippet            3:2493:0:2501 Execute failed: Error in R code: 
Error: non-character argument

The R script code used was:

tmp <- knime.in
df1 <- as.data.frame(tmp)
df1$'splt'[i] <- c(strsplit(df1$`Column Header`[i],split=""))
knime.out <- as.data.frame(df1)

Any ideas?

Hi, it seems to be.

df1 <- as.data.frame(tmp)
df1$'splt'<- c(strsplit(df1$`Column Header`,split=""))
knime.out <- as.data.frame(df1)

I actually added ‘as.data.frame(df1)’ later to ensure this, even though df1 is defined as a dataframe above.

In R studio, the output from df1 is:

> df1
   Column Header                         splt
1     Int_WC2hA1 I, n, t, _, W, C, 2, h, A, 1
2     Int_WC2hA2 I, n, t, _, W, C, 2, h, A, 2
3     Int_WC2hA3 I, n, t, _, W, C, 2, h, A, 3
4     Int_WC2hA4 I, n, t, _, W, C, 2, h, A, 4
5     Int_WD2hA1 I, n, t, _, W, D, 2, h, A, 1
6     Int_WD2hA2 I, n, t, _, W, D, 2, h, A, 2
7     Int_WD2hA3 I, n, t, _, W, D, 2, h, A, 3
8     Int_WD2hA4 I, n, t, _, W, D, 2, h, A, 4
9     Int_WC2hT1 I, n, t, _, W, C, 2, h, T, 1
10    Int_WC2hT2 I, n, t, _, W, C, 2, h, T, 2
11    Int_WC2hT3 I, n, t, _, W, C, 2, h, T, 3
12    Int_WC2hT4 I, n, t, _, W, C, 2, h, T, 4
13    Int_WD2hT1 I, n, t, _, W, D, 2, h, T, 1
14    Int_WD2hT2 I, n, t, _, W, D, 2, h, T, 2
15    Int_WD2hT3 I, n, t, _, W, D, 2, h, T, 3
16    Int_WD2hT4 I, n, t, _, W, D, 2, h, T, 4

@0nly4phil would it be possible to provide a complete example with your data?

@0nly4phil this seems to do the trick:

tmp <- as.data.frame(knime.in)
df1 <- as.data.frame(tmp)

# Ensure the column is a character vector
df1$'Column Header' <- as.character(df1$'Column Header')

# Apply strsplit
df1$'splt' <- strsplit(df1$'Column Header', split = "")

knime.out <- as.data.frame(df1)

string_to_character.knwf (7.3 KB)
Hi, this is an update of the workflow.
I thought it may be due to the seemingly unstructured data produced from strsplit, so the code was altered to:

df1 <- knime.in
df1 <- as.data.frame(df1)
a <- df1$`Column Header`
a <- strsplit(a,split="")
b <-matrix(unlist(a),ncol=10,byrow=T)
df1$'splt'<-b
knime.out <- df1

-but to no avail.
The error message from kinme was:

ERROR R Snippet            8:2508     Execute failed: Error in R code: 
Error: non-character argument
Error: replacement has 2 rows, data has 16

However,the following script can be run in R without any apparent problems:

library(readr)
setwd("C:/Phil/knime-workspaces/Phil5/MISC")
df1 <- read_csv("phil.csv")
df1 <- as.data.frame(df1)
a <- df1$`Column Header`
a <- strsplit(a,split="")
b <-matrix(unlist(a),ncol=10,byrow=T)
df1$'splt'<-b

It produces the output:

   Column Header splt.1 splt.2 splt.3 splt.4 splt.5 splt.6 splt.7
1     Int_WC2hA1      I      n      t      _      W      C      2
2     Int_WC2hA2      I      n      t      _      W      C      2
3     Int_WC2hA3      I      n      t      _      W      C      2
4     Int_WC2hA4      I      n      t      _      W      C      2
5     Int_WD2hA1      I      n      t      _      W      D      2
6     Int_WD2hA2      I      n      t      _      W      D      2
7     Int_WD2hA3      I      n      t      _      W      D      2
8     Int_WD2hA4      I      n      t      _      W      D      2
9     Int_WC2hT1      I      n      t      _      W      C      2
10    Int_WC2hT2      I      n      t      _      W      C      2
11    Int_WC2hT3      I      n      t      _      W      C      2
12    Int_WC2hT4      I      n      t      _      W      C      2
13    Int_WD2hT1      I      n      t      _      W      D      2
14    Int_WD2hT2      I      n      t      _      W      D      2
15    Int_WD2hT3      I      n      t      _      W      D      2
16    Int_WD2hT4      I      n      t      _      W      D      2
   splt.8 splt.9 splt.10
1       h      A       1
2       h      A       2
3       h      A       3
4       h      A       4
5       h      A       1
6       h      A       2
7       h      A       3
8       h      A       4
9       h      T       1
10      h      T       2
11      h      T       3
12      h      T       4
13      h      T       1
14      h      T       2
15      h      T       3
16      h      T       4

I realize I could use a work-around solution purely in Knime, but there are other steps I would like to use in R before returning the data frame, so any help on this would be very much appreciated.

That’s great! Thanks so much!

@0nly4phil for the second example as.data.table() as well as a conversion to character seems to do the trick.

The transfer is done with “data.table”:

df1 <- knime.in
df1 <- as.data.frame(df1)
a <- as.character(df1$'Column Header')
a <- strsplit(a,split="")
b <-matrix(unlist(a),ncol=10,byrow=T)

df1$'splt'<-b
knime.out <- as.data.table(df1)

Convert data from KNIME to R and back with split of text column - KNIME Forum (76340).knwf (88.8 KB)

3 Likes

Thanks for all your help!

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.