I am quiet new to Spark. I load data from Hive into spark. I want to manipulate the data in order to remove quotes and create new columns with calculated values using the Spark Java RDD Snippet. However, the Spark Java RDD Snippet doesn’t recognize any columns as variable(the normal Java Snippet for tables works fine). What can I do?
Thx for the help. Kind Regards
please have a look at the Modularized Spark Scripting workflow which is available on the example server. If you are working with Spark 2.x I would suggest that you use the Spark DataFrame Java Snippet node instead of the Spark RDD Java Snippet node which allows you to work with the much easier to use Spark DataFrame APi.
You can also have a look at the Spark SQL node which provides a lot of functions.