Hello,
I did the ‘spark k-means’ to cluster my data, then I finished, I want to extract the data for further analysis by using ‘table to spark’. However, I got error. I checked KNIME log but cannot find the solution:
The error message said checking KNIME log, but I am still confused.
Can you help me to solve this problem? I want to see my output spark data from ‘spark k-means’ and ‘spark to table’ node.
@DerekJin do you have a workflow and some test data to reproduce this?
As already mentioned by mlauber71, Spark does not like missing values and produce sometimes cryptic errors on that. You can use the Spark Missing Value node to eliminate them.
It only includes 0 and 1 for each cell, and they are double when I checked the types of columns.
Then my workflow is also simple like this:
the data can be read from ‘table to spark’, then I did spark k -means, it was also executable. But I got such an error when I tried to see the output from spark k -means.
i can’t reproduce this. Running local Spark with K-Means on a double table with 0 and 1 values works as expected (see attached workflow). Can you share more details about your setup? What Spark version you are running? How does the input data looks like, can you attach the data (with renamed column names if required)?
Hi sascha,
I am sorry to applying this message very late. I find my input dataset includes string, so I fix this problem. After I filter the string in the input matrix, I still get error about outofbound exception. My input matrx has 8030 columns, and the error is 8029 (ArrayIndexOutOfBoundsException) when I tried to get the output view from spark k-means. And I cannot use spark to table node after spark k means node(with same error).
Here is the sample data I used in my workflow. data.zip (247.3 KB)
looks like the Spark K-Means node has trouble with tables containing a column called cluster. As a workaround you can use a Spark column filter or Spark column rename node in front to remove or rename the cluster column before the Spark K-Means node. Thanks for reporting this.