I’m trying to save data from DB2 to Hive. The DB2 database is on a separate server from the Hadoop cluster. Here’s the workflow that I used.
(Do correct me if I’m wrong) This workflow seems to suggest the notion that the parts of the job that is sent to the Spark Job Server is the green shaded part of the workflow. While the blue shaded part of the workflow are those which are executed locally.
Is there any way to have the blue shaded part executed on the Hadoop cluster as well?
Thanks in advance
everything is executed in the Hadoop cluster. When you execute the DB nodes they send SQL queries to the DB that return the meta data only not the data itself. The Database to Spark node then receives the query that returns all the data including the connection information and executes the query in Spark in your Hadoop cluster. So in short no node in the workflow loads the data into KNIME itself.
Great, thanks for the clarification, Tobias
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.