Database writer to hive

Hi,

I have regularly used the Hive connector nodes to download data from KNIME, transform the data and write it back to hive. When I write it back to hive, I use python script node and pywebhdfs package. It has worked fine so far when executing the workflows in the server.
However, I noticed errors in the python script node which I am not able to debug.
These are errors from two different workflows has failed execution in the server around the same time.
image
image

One way to workaround this, I am going to try to use the native KNIME node to write the table back to Hive. I tried to use a Database writer and connected to the Hive connector node.
However, I get the error below.
image

Can anyone share how they write a Hive table back to the database?
Thanks.

Hello Shalinirs,
you can either use the new DB Loader node together with the new database framework or if you are still using the old database framework (nodes that end with (legacy)) you can use the Hive Loader.
Bye
Tobias

Thanks Tobias.
I used the Hive Loader with the HttpFS connection. It executed successfully when i tried in local. I am trying to write this table in hive. I used a Database Connection Table Writer and that errors out saying ‘table not found’. The table name is specified in the Hive Loader node, which runs fine.

One of the errors I got in the KNIME server was that connection timed out error in the Hive Loader node. However, Hive loader node runs fine locally and the test connection succeeds too. The time is set for 15 mins in the KNIME server settings (but the workflow on the server fails in less than a minute).

In one of the attempts, I found the table name in the our Hadoop database and the hive tables and there was a suffix with some numbers (guessing it is date/ timestamp).

I am assuming that’s some kind of a temporary table and KNIME should rewrite it in the table name specified in the Hive Loader node but it did not.

I am not sure how to work around this error. Let me know if you need more information.

Thanks!

Hi,
regarding the table not found. This might be caused by missing identifier quotes or that the name has been changed by Hive since it is very particular about table and column names. So have a look in the database how the table is named. The identifier quoting has been improved with the new database framework which we have released with KNIME 4.0

When running the workflow on the server the Hive Connector node executes successfully but the Hive Loader node then fails with a timeout. Is this a connection timeout? Maybe you can post the error message with the stack trace here if it doesn’t contain any sensitive data. If this is the case I guess the KNIME Server has no access to the HttpFS host.

If the table already exists in Hive KNIME uses a temporary table which gets filled with the data first and then is used as the source of a select into statement which finally moves the data into the final destination table. This is very likely the table you have seen with the number suffix. However once the data is loaded the table should be removed.

Attached you can find a workflow which simulates the different steps of the Hive Loader which you can adapt to your environment e.g. replace SSH Connector with HttpFS connector in order to see which step makes problems.

Bye
Tobias

hiveLoaderSimulation.knwf (28.0 KB)

Hi Tobias,

I belong to the same team as Shalini. I manage the knime and hadoop servers.
Thank you for your response.

Looking at the logs, i could figure out blockers when running the workflow in local and server mode.

When we run the workflow in local, Hive loader get stuck at 59% and never insert values from temp to original table. I am pasting the error below.

2019-07-02 09:14:12,463 : DEBUG : KNIME-Worker-38 : HiveLoader : Hive Loader : 6:373 : Temporary table sucessful created

2019-07-02 09:14:12,463 : DEBUG : KNIME-Worker-38 : HiveLoader : Hive Loader : 6:373 : Load data from hdfs

2019-07-02 09:14:14,045 : DEBUG : KNIME-Worker-38 : HiveLoader : Hive Loader : 6:373 : Executing ‘LOAD DATA INPATH ‘/user/harsingh/hive-import3211c370_a9ad_4ca0_a45c_3c5ddbbf62846119353587619459221.csv’ INTO TABLE account_mapping.lead_test_knime_287102c9_58b1_4333_bc83_a43fbdcb27e1’

2019-07-02 09:14:15,911 : DEBUG : KNIME-Worker-38 : HiveLoader : Hive Loader : 6:373 : Copying data to existing table

2019-07-02 09:14:15,911 : DEBUG : KNIME-Worker-38 : HiveLoader : Hive Loader : 6:373 : Executing 'INSERT INTO TABLE account_mapping.lead_test_knime

SELECT parent_account_id,segment

FROM account_mapping.lead_test_knime_287102c9_58b1_4333_bc83_a43fbdcb27e1’

When running the workflow in server mode, Hive loader fails with “timed out” error. Httpfs node is reachable from Knime and test connection gives “succeeded” result.
Httpfs is an ELB which redirects the request to an internal server and it’s timeout value is set to 900. I don’t see any issue with Hive here. I am pasting the error below.

2019-07-02 03:46:26,656 : DEBUG : KNIME-Worker-24 : UserGroupUtil : Hive Loader : 2:373 : Returning simple authentication user: harsingh (auth:SIMPLE)
2019-07-02 03:47:26,735 : DEBUG : KNIME-Worker-24 : Node : Hive Loader : 2:373 : reset
2019-07-02 03:47:26,735 : ERROR : KNIME-Worker-24 : Node : Hive Loader : 2:373 : Execute failed: hdfs-lb-1976646374.us-west-2.elb.amazonaws.com:14000: connect timed ou
t
2019-07-02 03:47:26,735 : DEBUG : KNIME-Worker-24 : Node : Hive Loader : 2:373 : Execute failed: hdfs-lb-1976646374.us-west-2.elb.amazonaws.com:14000: connect timed ou
t
java.net.SocketTimeoutException: hdfs-lb-1976646374.us-west-2.elb.amazonaws.com:14000: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)

My question is if the local mode is creating a temp table with the same Httpfs connection then why it’s getting “timed out” in server mode and what should we do to insert complete records from temp to original table.

Thanks,
Harpreet

Hello Harpreet,
the KNIME Server uses the same httpFS connection as the client. However it seems that your KNIME Server machine or to be more specific the machine the KNIME Executor of the KNIME Server is running on has no access to the httpFS service. Whereas the KNIME client seems to have access to it. Unfortunately the test connection button in the httpFS Connector node does not always tests the connection properly. Maybe you can try to execute a simple workflow on the KNIME Server with a httpFS Connector and a List Remote Files node to see if the hdfs is accessible from the KNIME Server. Maybe you can also try to execute a ‘INSERT INTO TABLE SELECT FROM’ statement from HUE or another tool to see if these statements work in general on you cluster. Do you see any errors or warning in the Hive logs of you cluster when the workflow is executing. At least in local mode it seems to get as far as the ‘INSERT INTO’ statement. Do you get an exception in KNIME when executing the ‘INSERT INTO’ statement locally?
Bye
Tobias

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.