HIVE SQL

Hi Guys,

I want to create a table in hive on a schema with my own custom sql, something like this

use myschema;

create table as MyTable

select * from table WithinHive;

My flow is as attached but I cant get the table creation nodes to work. Am i correct with having the hive connector as first node, then database table selector as the second node containing the hive SQL?

Hi,

> Am i correct with having the hive connector as first node

yes

> then database table selector as the second node containing the hive SQL?

With the "Database Table Selector" you can make a query, not a DDL statement (e.g. CREATE TABLE). Note also that the recommended way of building a query is to only use the "Database Table Selector" as a starting point (e.g. to do "SELECT * from table" and then build a more complex query from this starting point using other Database nodes to do in-database grouping, filtering, joining etc (see [1] for some examples).

The "Database Table Creator" node unfortunately cannot create Hive tables. How to create a Hive tables depends on where the data for that table currently resides.

If your data currently resides in KNIME Analytics Platform and you want to upload it into a Hive table, then use the "Hive Loader" node. This node will also create the table before uploading the data. See [1] for an example (but note that we now recommend using the HttpFS Connection node instead of SSH Connection to transfer the data).

If your data currently resides in another Hive table, then you can make a query for that data (as described before) and store the result of that query with the "Database Connection Table Writer" node. This will effectively run a "CREATE TABLE ... AS SELECT ..." statement.

Hope this helps,

Björn

 

[1] https://www.knime.com/nodeguide/big-data/big-data-connectors/big-data-preprocessing-example

Hi Bjorn,

I am experiencing a similar problem where I’m trying to load data into hive using hive loader but it’s creating a file on the hdfs/httpFS location but no table and the connection gets timed out. Below is the knime log:

2019-04-09 13:54:01,652 : DEBUG : main : ExecuteAction : : : Creating execution job for 1 node(s)…
2019-04-09 13:54:01,652 : DEBUG : main : NodeContainer : : : Hive Loader 0:128 has new state: CONFIGURED_MARKEDFOREXEC
2019-04-09 13:54:01,652 : DEBUG : main : NodeContainer : : : Hive Loader 0:128 has new state: CONFIGURED_QUEUED
2019-04-09 13:54:01,662 : DEBUG : main : NodeContainer : : : KNIME_project 0 has new state: EXECUTING
2019-04-09 13:54:01,669 : DEBUG : KNIME-WFM-Parent-Notifier : NodeContainer : : : ROOT has new state: EXECUTING
2019-04-09 13:54:01,670 : DEBUG : KNIME-Worker-8 : WorkflowManager : Hive Loader : 0:128 : Hive Loader 0:128 doBeforePreExecution
2019-04-09 13:54:01,670 : DEBUG : KNIME-Worker-8 : NodeContainer : Hive Loader : 0:128 : Hive Loader 0:128 has new state: PREEXECUTE
2019-04-09 13:54:01,670 : DEBUG : KNIME-Worker-8 : WorkflowManager : Hive Loader : 0:128 : Hive Loader 0:128 doBeforeExecution
2019-04-09 13:54:01,670 : DEBUG : KNIME-Worker-8 : NodeContainer : Hive Loader : 0:128 : Hive Loader 0:128 has new state: EXECUTING
2019-04-09 13:54:01,670 : DEBUG : KNIME-Worker-8 : WorkflowDataRepository : Hive Loader : 0:128 : Adding handler 386db7b8-a113-4131-8b21-686de640d906 (Hive Loader 0:128: ) - 2 in total
2019-04-09 13:54:01,670 : DEBUG : KNIME-Worker-8 : LocalNodeExecutionJob : Hive Loader : 0:128 : Hive Loader 0:128 Start execute
2019-04-09 13:54:01,898 : DEBUG : KNIME-Worker-8 : AbstractLoaderNodeModel : Hive Loader : 0:128 : Start writing KNIME table to temporary file C:\Users\tk173537\AppData\Local\Temp\knime_KNIME_project27443\hive-importc3cd3bf4_0a9b_483b_b19c_8575299a1e815218807826069089374.csv
2019-04-09 13:54:01,944 : DEBUG : KNIME-Worker-8 : AbstractLoaderNodeModel : Hive Loader : 0:128 : Table structure name=default,columns=[sas_id]
2019-04-09 13:54:01,944 : DEBUG : KNIME-Worker-8 : AbstractLoaderNodeModel : Hive Loader : 0:128 : No of rows to write 100.0
2019-04-09 13:54:02,027 : DEBUG : KNIME-Worker-8 : AbstractTableStoreReader : Hive Loader : 0:128 : Opening input stream on file “C:\Users\tk173537\AppData\Local\Temp\knime_KNIME_project27443\knime_container_20190409_2498813877045765933.tmp”, 0 open streams
2019-04-09 13:54:02,056 : DEBUG : KNIME-Worker-8 : AbstractTableStoreReader : Hive Loader : 0:128 : Closing input stream on “C:\Users\tk173537\AppData\Local\Temp\knime_KNIME_project27443\knime_container_20190409_2498813877045765933.tmp”, 0 remaining
2019-04-09 13:54:02,056 : DEBUG : KNIME-Worker-8 : AbstractLoaderNodeModel : Hive Loader : 0:128 : Temporary file successful created at C:\Users\tk173537\AppData\Local\Temp\knime_KNIME_project27443\hive-importc3cd3bf4_0a9b_483b_b19c_8575299a1e815218807826069089374.csv
2019-04-09 13:54:02,056 : DEBUG : KNIME-Worker-8 : HiveLoader : Hive Loader : 0:128 : Uploading local file C:\Users\tk173537\AppData\Local\Temp\knime_KNIME_project27443\hive-importc3cd3bf4_0a9b_483b_b19c_8575299a1e815218807826069089374.csv
2019-04-09 13:54:02,057 : DEBUG : KNIME-Worker-8 : HiveLoader : Hive Loader : 0:128 : Create remote folder with URI httpfs://tanul.kala@azruvapbdm02.sephoraus.com:50070/apps/hive/warehouse/tanul_kala.db/
2019-04-09 13:54:02,128 : DEBUG : KNIME-Worker-8 : HDFSConnection : Hive Loader : 0:128 : Adding fs.defaultFS to config: webhdfs://azruvapbdm02.sephoraus.com:50070/
2019-04-09 13:54:02,129 : DEBUG : KNIME-Worker-8 : UserGroupUtil : Hive Loader : 0:128 : Retrieving user for simple authentication
2019-04-09 13:54:02,129 : DEBUG : KNIME-Worker-8 : UserGroupUtil : Hive Loader : 0:128 : Current user: tk173537 (auth:SIMPLE)
2019-04-09 13:54:02,129 : DEBUG : KNIME-Worker-8 : UserGroupUtil : Hive Loader : 0:128 : Login user: tanul.kala
2019-04-09 13:54:02,129 : DEBUG : KNIME-Worker-8 : UserGroupUtil : Hive Loader : 0:128 : Creating remote user tanul.kala using current user tk173537
2019-04-09 13:54:02,129 : DEBUG : KNIME-Worker-8 : UserGroupUtil : Hive Loader : 0:128 : Returning simple authentication user: tanul.kala (auth:SIMPLE)
2019-04-09 13:54:02,320 : DEBUG : KNIME-Worker-8 : HiveLoader : Hive Loader : 0:128 : Remote folder created
2019-04-09 13:54:02,320 : DEBUG : KNIME-Worker-8 : HiveLoader : Hive Loader : 0:128 : Create remote file with URI httpfs://tanul.kala@azruvapbdm02.sephoraus.com:50070/apps/hive/warehouse/tanul_kala.db/hive-importc3cd3bf4_0a9b_483b_b19c_8575299a1e815218807826069089374.csv
2019-04-09 13:54:02,320 : DEBUG : KNIME-Worker-8 : HiveLoader : Hive Loader : 0:128 : Remote file created. Start writing file content…
2019-04-09 13:54:14,970 : DEBUG : main : NodeContainerEditPart : : : Hive Loader 0:128 (EXECUTING)
2019-04-09 13:54:14,970 : DEBUG : main : NodeContainerEditPart : : : HttpFS Connection 0:167 (EXECUTED)
2019-04-09 13:54:17,041 : DEBUG : main : NodeContainerEditPart : : : HttpFS Connection 0:167 (EXECUTED)
2019-04-09 13:54:17,042 : DEBUG : main : NodeContainerEditPart : : : WebHDFS Connection 0:148 (EXECUTED)
2019-04-09 13:54:23,468 : DEBUG : KNIME-Worker-8 : Node : Hive Loader : 0:128 : reset
2019-04-09 13:54:23,468 : ERROR : KNIME-Worker-8 : Node : Hive Loader : 0:128 : Execute failed: azruvapbdd26:50075: Connection timed out: connect
2019-04-09 13:54:23,470 : DEBUG : KNIME-Worker-8 : Node : Hive Loader : 0:128 : Execute failed: azruvapbdd26:50075: Connection timed out: connect
java.net.ConnectException: azruvapbdd26:50075: Connection timed out: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:597)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:554)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:622)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:472)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:502)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1758)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:498)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.create(WebHdfsFileSystem.java:1186)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:910)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:891)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:788)
at org.knime.bigdata.hdfs.filehandler.HDFSConnection.create(HDFSConnection.java:315)
at org.knime.bigdata.hdfs.filehandler.HDFSRemoteFile.openOutputStream(HDFSRemoteFile.java:115)
at org.knime.base.filehandling.remote.files.RemoteFile.write(RemoteFile.java:342)
at org.knime.bigdata.hive.utility.HiveLoader.uploadFile(HiveLoader.java:93)
at org.knime.bigdata.hive.utility.AbstractLoaderNodeModel.execute(AbstractLoaderNodeModel.java:267)
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:567)
at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1186)
at org.knime.core.node.Node.execute(Node.java:973)
at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:559)
at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)
at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:179)
at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:110)
at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:328)
at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:204)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
2019-04-09 13:54:23,471 : DEBUG : KNIME-Worker-8 : WorkflowManager : Hive Loader : 0:128 : Hive Loader 0:128 doBeforePostExecution
2019-04-09 13:54:23,471 : DEBUG : KNIME-Worker-8 : NodeContainer : Hive Loader : 0:128 : Hive Loader 0:128 has new state: POSTEXECUTE
2019-04-09 13:54:23,471 : DEBUG : KNIME-Worker-8 : WorkflowManager : Hive Loader : 0:128 : Hive Loader 0:128 doAfterExecute - failure
2019-04-09 13:54:23,472 : DEBUG : KNIME-Worker-8 : Node : Hive Loader : 0:128 : reset
2019-04-09 13:54:23,472 : DEBUG : KNIME-Worker-8 : Node : Hive Loader : 0:128 : clean output ports.
2019-04-09 13:54:23,472 : DEBUG : KNIME-Worker-8 : WorkflowDataRepository : Hive Loader : 0:128 : Removing handler 386db7b8-a113-4131-8b21-686de640d906 (Hive Loader 0:128: ) - 1 remaining
2019-04-09 13:54:23,472 : DEBUG : KNIME-Worker-8 : NodeContainer : Hive Loader : 0:128 : Hive Loader 0:128 has new state: IDLE
2019-04-09 13:54:23,493 : DEBUG : KNIME-Worker-8 : Node : Hive Loader : 0:128 : Configure succeeded. (Hive Loader)
2019-04-09 13:54:23,494 : DEBUG : KNIME-Worker-8 : NodeContainer : Hive Loader : 0:128 : Hive Loader 0:128 has new state: CONFIGURED
2019-04-09 13:54:23,494 : DEBUG : KNIME-Worker-8 : NodeContainer : Hive Loader : 0:128 : KNIME_project 0 has new state: IDLE
2019-04-09 13:54:23,494 : DEBUG : KNIME-WFM-Parent-Notifier : NodeContainer : : : ROOT has new state: IDLE

-Tanul

I can think of a few things to check

  • does it work with a small table
  • make sure the table is not there in the first place (even if you have checked the “Drop existing table”)
  • make sure you can access the temporary folder and it is large enough to store the data
  • see if you have any ‘funny’ data types that might use special converting (does not look like that but still)
  • If you can use Impala loader try that as an alternative

Maybe try again and attach a LOG file to your post.

Then we had a discussion where I suggested using the external table function of Hive to create a table. Basically uploading it to HDFS and then converting it to a ‘regular’ table. Also you could split the data into several parts and address a whole folder so you might be able to get the data through a potentially slow network.

Hi,

I have made sure to try out the first 4 points made by you and didn’t get any success. I will try and see if I can establish via impala and have sent out a request to my big data team. Strangely, the file gets created at the target folder but doesn’t really writes any data (Even with a small data set).

I have read and write access to the directory :

I am going to try to do the same using your suggestion with the local big data environment but so far I have been unable to upload the file to HDFS itself so not sure if that will work as well.

I am attaching the knime log for my latest run below.

thanks
TanulKnime-Log.txt (16.6 KB)

OK a few more things:

  • there are various ways to connect to a web resource *1). You could check any of these and see if you can just upload a simple small file to your folder via KNIME and then use that connection for the Hive Uploader
  • in the logfile there is a part where there are two users mentioned (Creating remote user tanul.*** using current user tk17****). Is it possible that there are two accounts and one account uploads the TXT file and the other ones tries to create the Hive file.
  • I have seen instances on Cloudera where there were specially designated ‘upload’ folders (not sure how to put that in a more technical term). You could upload to your own folder and the ‘upload’ folder; everything was OK. But the moment you were trying to get a new Hive or Imapla file up (via external table or something) the system would only allow the ‘upload’ folder (I am not an expert in rights management unfortunately - that would be a question for your admins)
  • then there is “DFSClient : : : Failed to close inode 2438******”. If you google that a lot of strange things come up, but there is a slim possibility one or your nodes might have a problem; do you experience other problems. The error points to something that has to do with the -well- HDFS file system (again might have to do with some rights management)
  • just to be sure you could try and use a file with no special characters, no blanks in file names, no blanks or special characters in the folder names and so on (I do not think that is the problem, but just to be sure)
  • and then just to be sure. Your lower screenshot has “tanul.kala.db” in the logfile it tries to write to “tanul_kala.db” (with an underscore). Could mean nothing or a very strange combination of problems (dots in user names get converted to _ but another function does not get that). Maybe you could use the ‘browse’ function to point it to a directory, just to be sure

*1)

Thanks for responding so quickly.

I mistakenly sent you the screenshot for the wrong folder (tanul.kala.db) instead of the tanul_kala.db (image attached below)

I have tried executing the node using different connections like httpFS, webhdfs connection nodes too but haven’t got any positive results. Looking the log file for the hdfs connection it seems like it is unable to write data into the target folder but is able to create a folder at hdfs.

To answer your other question : tanul.kala is the remote username for the hive server while my local machine username is tk***** so the permissions will be mapped and hence it is able to write the data.
one other thing that I noticed from the log is that the temp file is created somewhere in the local machine and then being uploaded to the hive server target folder (this is where it is failing). Also, the target folder I have specified is the location in the cluster with tables created by me within hive using DDL commands so could it be that I am selecting the wrong target folder since the process is trying to copy the local copy to be created on the target folder and maybe it is not the type of file meant to be written there? Also, the Big data is based on hortonworks and not cloudera.

I am also attaching the knime log for the execution using the httpFS connection.Knime-Log- httpFS.txt (11.8 KB)

thanksKnime-Log.txt (9.0 KB)

My suspicion would be that it has to do with the rights management where it could be different between being able to write a file into HDFS and then creating a Hive table. I think you will have to check with your admins what is the case. Mapping and inheriting rights between users can be a tricky business.

Question is are you able to upload a simple TXT file to the folder in question via KNIME without the Hive loader, just with the upload node. That would give us further ideas about afore mentioned rights situation.

Then if that is possible you could see if this external table thing can help you. I admit it is not ideal.

Have you created a new Hive table on that machine just via HUE in the browser. If that is possible it would give us further informations also if it is not possible.

Hi @tkala
it looks like you are trying to upload the temporary file in the warehouse folder “/apps/hive/warehouse/tanul_kala.db/” I would not recommend that.
Could you try another folder (e.g. /tmp/) that is not in the apps area.

The Hiveloader will store a temporary file in HDFS and afterwards use HIVE to load the data from the file into the warehouse.

1 Like

Hi @mlauber71, @bjoern.lohrmann

I am not able to upload any file tried it with a simple csv file and also a txt file (log attached below)

I am not sure I am even getting to the point of writing tables in hive using the loader. The node is failing while it is still attempting to write data into the temporary file. I will have the admins take a look again and get back to you for more insights.

I have created a new table using Ambari on the same machine with success using a csv.

Knime-Log.txt (4.7 KB)

The rights and permissions seem to be in place for my user_id. Below is the snapshot of the files that are being created by hive loader and the upload node in the tmp folder without writing any data.

Am I doing something wrong with defining the connection string for hive connection?

Hi @mareike.hoeger
I have tried multiple folders including the /tmp/ folder as well but it is exactly the same situation. It creates an empty temporary file at the location and gives me a connection timeout error (different for httpFS and HDFS connection)

I think you have to sort out which user has which rights and if the connection settings to the Big Data cluster are correct. Here it seems it is connecting to an IP address, not sure if this corresponds to the access string for your Hive node.

And I still think there could be something wrong with the tow users referencing each other. I am by no means an expert in Big Data rights management but I have seen strange things in that regard. And ofter there can be just one more simple right (allowing impersonation) that has to be changed. And as I have mentioned not all folders might allow the same actions (create new tables from them) even if you seemingly have the correct privileges. I think you have to talk it thru with your admins and try a few things. Are there people who have successfully done what you are doing? Maybe you could copy their approach.