I am trying to use the parallel execution nodes to process an R Snippet node. In my workflow I have a Parallel Chunk Start node (configured for 4 chunks), an R Snippet Node, then a Parallel Chunk Stop node. If I execute the Parallel Chunk Stop node I see a Parallel Chunk Start and 3 Virtual Starts in the Knime console. top on my linux machine shows 4 R processes running, then I see the 4 chunks end. After this is done, the Parallel Chunk End bnode reports an error:
Execute failed: Runtime class of object "0.07" (index 2) in row "1_#2" is DoubleCell and does not comply with its supposed superclass StringCell
After this failure, the R Snippet node has a data output that appers to be complete as expectyed with one string column and 5 double columns. I tried passing this table through a Column rename node before the Parallel Chunk Stop node. The columns get renamed properly, but the Parallel Chunk Stop node fails again. I also tried to convert all columns to strings, but many of the cells in my table were left with missing values.
Suggestion on how to complete this parallel execution workflow would be appreciated.
The loop's end node determines the table structure based on the first finished chunk. It seems the third column ("index 2") was a string column in this chunk. One of the other chunks created a double column but you cannot add DoubleCells to a string column. The Rename node also doesn't help because it doesn't change the cell type it merely makes a typecast on the column's specification. You can try to use Number to String (or String to Number). I would start with checking why the columns have different types in the first place, though.
I changed my workflow such that some of the postcalculation reformatting takes place within Knime rather than within R. The data type error has gone away, but I get a different error message.
This time the Parallel Chunk End node reports "Encountered duplicate row ID at "1" at row number 2001. This is odd as the output should be 2,000 rows and the node immediately preceding the Parallel Chunk End node has all 2,000 rows in it output table.