get error on reading spark data from 'spark to table' node

#1

Hello,
I did the ‘spark k-means’ to cluster my data, then I finished, I want to extract the data for further analysis by using ‘table to spark’. However, I got error. I checked KNIME log but cannot find the solution:


Then I think maybe first fetch 100 rows to see the output of ‘spark k-means’ node, but it still did not work, as shown in KNIME log:

The error message said checking KNIME log, but I am still confused.
Can you help me to solve this problem? I want to see my output spark data from ‘spark k-means’ and ‘spark to table’ node.

0 Likes

#2

If you could provide us with an example or more information (like the LOG file) about the versions and the data it might be helpful.

In general my experience with Spark (and KNIME) is that spark is very picky when it comes to:

  • Non-Double numbers like integers (yes sounds strange)
  • string variable (might have to be converted via dictionary or label encoding)
  • continuous variables (the same value in all entries)
  • NaN in columns

I had success overcoming problems with KNIME and spark nodes by eliminating all of these problems.

1 Like

#3

Hi,

@DerekJin do you have a workflow and some test data to reproduce this?

As already mentioned by mlauber71, Spark does not like missing values and produce sometimes cryptic errors on that. You can use the Spark Missing Value node to eliminate them.

Sascha

0 Likes

#4

Hi,
Actually there is no missing value in my data, my data is boolean matrix like this:


It only includes 0 and 1 for each cell, and they are double when I checked the types of columns.
Then my workflow is also simple like this:
image
the data can be read from ‘table to spark’, then I did spark k -means, it was also executable. But I got such an error when I tried to see the output from spark k -means.

0 Likes

#5

Hi @DerekJin,

i can’t reproduce this. Running local Spark with K-Means on a double table with 0 and 1 values works as expected (see attached workflow). Can you share more details about your setup? What Spark version you are running? How does the input data looks like, can you attach the data (with renamed column names if required)?

SparkKMeansBooleans.knwf (11.9 KB)

1 Like