I am getting below error while executing ‘csv to spark’ node …
ERROR CSV to Spark 0:3 Execute failed: org.apache.spark.sql.execution.FileSourceScanExec; local class incompatible: stream classdesc serialVersionUID = 1920947604238219635, local class serialVersionUID = -3589590085483687218 (InvalidClassException)
Also attaching log file and workflow screenshots…
knime_log.txt (19.3 KB)
looking at the knime.log (thanks for providing), it looks like some Spark internal data serialization between Spark driver and executor is failing.
This really only happens if you are using different Java and/or Spark versions between your cluster nodes. These must all be the exact same versions.
Could that be the case?
I m using spark version 2.4 and livy 0.7 and knime latest version (4.2.1) , If these are not stable versions of spark & livy for knime 4.2.1 , plaese let me know the correct ones.
Spark 2.4 and Livy 0.7 should work fine with KNIME 4.2.1. What type and version of cluster your are using? (CDH/HDP/AWS EMR/Azure…)
Could you first load the data into KNIME and see what types are there and what the Type mapping does say. Or if there are strange values with NaN(not a number).
There are only string and integer dtypes in my data and no missing values…
still the same error
Have you tried converting the integers to doubles (I am aware this sounds counter intuitive) and then upload the data. Maybe start with the numbers only.
And could you check into what type of variable your integers would be converted.
have converted all the integer column to doubles & tried it , still the same error persists…
Did you use the node:
You could set the LOG level to DEBUG and try again. Maybe we could get an idea. It you could share a sample of the data that is failing that might also help.
are you talking about using ‘Table to spark’ node instead of ‘csv to spark’ node?
Yes I do. This could give you more control off the formats. BTW: how did you convert the integers to doubles anyway.
with the help of column rename node
Column rename is notorious for messing up formats. You might want to use another converter. But I am not sure it is about integers or doubles.
You might have to map integers to bigint in a big data environment.
Ok i ll try that too , but for now can you tell me how to connect table to spark node with ‘Hdfs connection’ node & ‘spark context’ node simultaneously which is possible with ‘csv to spark’ node as shown in my workflow…because the csv file I want to use is on hdfs .
Then you could try and tell Hive or Impala to treat it as an external table and load that into spark. You could declare the formats as string, int or bigint or double.
(The green block on top)
The green block on top
Has to be the part
beneath the green block
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.