Stemming in Spark DataFrame Java Snippet produce NoClassDefFoundError

Hello,

I am using the KNIME extension for Apache Spark to process large amounts of data. Unfortunately there are currently no text-mining nodes. That’s why I have to use the Spark DataFrame Java Snippet. The node produce a NoClassDefFoundError while stemming. I use these libraries: spark-stemming-0.2.0.jar and spark-mllib_2.11-2.3.1.jar. When I try to execute the Spark DataFrame Java Snippet Node it crash. I got a NoClassDefFoundError. In Intellij IDEA the code is running.

Can anybody help?

Best regards
Sebastian

spark_stemming.knwf (14.0 KB)

PS: You have to download these two libraries and copy to folder “spark_stemming/libs”. These libraries are too big to upload.

Hey @sebastianengelmann,
the used libraries need to be present on your cluster/Local Big Data Environment and added to the class path. They are not automatically uploaded!
In the Local Big Data Environment you can do this by adding the path to the jar files in the custom spark setting with:
spark.jars: /path/to/some.jar
On a cluster you have to upload the files, either to a directory that is already in the classpath of spark or also add the path to the spark settings.

best regards Mareike

3 Likes

Hello @mareike.hoeger,

that’s the solution.

Thank you.

Best regards
Sebastian

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.