Unable to read the data from Hive using Database Connection Table Reader

HI :slight_smile:

I am trying to use the “Database Connection Table Reader” to read the data from hive, but it seems that It is limited to a fixed number of rows.when the row of the data to be read is 500,it goes well,but when the rows is 5000.it can’t work. Both database drivers provided by Cloudera for Hive and the built-in drivers are tried.

The columns to be read is about 820. the following is my workflow:
image

Is someone here who could suggest me some ways to resolve it ? I attached my workflow with some images to illustrate.

Thank you in advance for the time you will accord to my issue.

irving

AWithout the code and the data it is not easy to say what is going on. Could you share the error report.

These things come to my mind:

  • with hive queries do not use a semicolon „;“ at the end of the code (has something to do with knime so imolementation)
  • consider using COMPUTE INCREMENTAL STATS on your data before downloading it in Impala (it will give the hive system a better idea what to expect)
  • see if you could do the operation on you big data system without knime (if your table is good in the first place)

If you share any workflows with a hive or impala connector make sure you deleted any credentials you might have saved with the workflow.

HI @mlauber71 :

Thanks for your quick response.
1.queries use none semicolon at the end of the code.%E6%96%B0%E9%BB%9E%E9%99%A3%E5%9C%96%E5%BD%B1%E5%83%8F
2.when the SQLstatement has limit of 500,It works well.but when the limit>2000,it can’t work.so it seems that the hive connector and the database table selector are correct,only the database connection has something wrong.
3.the error code in console:
ERROR Database Connection Table Reader 3:214 ainer_e137_1526794512062_16404_01_000058 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 16622 16576 16622 16622 (bash) 0 0 108707840 633 /bin/bash -c /usr/jdk64/jdk1.8.0_60/bin/java -server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.6.0-40 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -Xmx10240m -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/mnt/data/sde/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000058 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/mnt/data/sdh/yarn/local/usercache/mlb/appcache/application_1526794512062_16404/container_e137_1526794512062_16404_01_000058/tmp org.apache.tez.runtime.task.TezChild 172.18.0.7 37065 container_e137_1526794512062_16404_01_000058 application_1526794512062_16404 1 1>/mnt/data/sde/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000058/stdout 2>/mnt/data/sde/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000058/stderr
|- 16823 16622 16622 16622 (java) 11857 2083 13077819392 2290449 /usr/jdk64/jdk1.8.0_60/bin/java -server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.6.0-40 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -Xmx10240m -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/mnt/data/sde/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000058 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/mnt/data/sdh/yarn/local/usercache/mlb/appcache/application_1526794512062_16404/container_e137_1526794512062_16404_01_000058/tmp org.apache.tez.runtime.task.TezChild 172.18.0.7 37065 container_e137_1526794512062_16404_01_000058 application_1526794512062_16404 1

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
]], TaskAttempt 1 failed, info=[Container container_e137_1526794512062_16404_01_000481 finished with diagnostics set to [Container failed, exitCode=-104. Container [pid=25889,containerID=container_e137_1526794512062_16404_01_000481] is running beyond physical memory limits. Current usage: 8.6 GB of 8 GB physical memory used; 12.3 GB of 40 GB virtual memory used. Killing container.
Dump of the process-tree for container_e137_1526794512062_16404_01_000481 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 25899 25889 25889 25889 (java) 9818 2134 13057638400 2243103 /usr/jdk64/jdk1.8.0_60/bin/java -server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.6.0-40 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -Xmx10240m -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/mnt/data/sdb/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000481 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/mnt/data/sde/yarn/local/usercache/mlb/appcache/application_1526794512062_16404/container_e137_1526794512062_16404_01_000481/tmp org.apache.tez.runtime.task.TezChild 172.18.0.7 37065 container_e137_1526794512062_16404_01_000481 application_1526794512062_16404 1
|- 25889 25887 25889 25889 (bash) 0 0 108707840 623 /bin/bash -c /usr/jdk64/jdk1.8.0_60/bin/java -server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.6.0-40 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -Xmx10240m -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/mnt/data/sdb/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000481 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/mnt/data/sde/yarn/local/usercache/mlb/appcache/application_1526794512062_16404/container_e137_1526794512062_16404_01_000481/tmp org.apache.tez.runtime.task.TezChild 172.18.0.7 37065 container_e137_1526794512062_16404_01_000481 application_1526794512062_16404 1 1>/mnt/data/sdb/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000481/stdout 2>/mnt/data/sdb/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000481/stderr

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
]], TaskAttempt 2 failed, info=[Container container_e137_1526794512062_16404_01_000333 finished with diagnostics set to [Container failed, exitCode=-104. Container [pid=14939,containerID=container_e137_1526794512062_16404_01_000333] is running beyond physical memory limits. Current usage: 8.6 GB of 8 GB physical memory used; 12.3 GB of 40 GB virtual memory used. Killing container.
Dump of the process-tree for container_e137_1526794512062_16404_01_000333 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 14939 14907 14939 14939 (bash) 0 0 108707840 603 /bin/bash -c /usr/jdk64/jdk1.8.0_60/bin/java -server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.6.0-40 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -Xmx10240m -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/mnt/data/sdl/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000333 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/mnt/data/sdc/yarn/local/usercache/mlb/appcache/application_1526794512062_16404/container_e137_1526794512062_16404_01_000333/tmp org.apache.tez.runtime.task.TezChild 172.18.0.7 37065 container_e137_1526794512062_16404_01_000333 application_1526794512062_16404 1 1>/mnt/data/sdl/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000333/stdout 2>/mnt/data/sdl/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000333/stderr
|- 15132 14939 14939 14939 (java) 10979 2146 13048573952 2250908 /usr/jdk64/jdk1.8.0_60/bin/java -server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.6.0-40 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -Xmx10240m -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/mnt/data/sdl/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000333 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/mnt/data/sdc/yarn/local/usercache/mlb/appcache/application_1526794512062_16404/container_e137_1526794512062_16404_01_000333/tmp org.apache.tez.runtime.task.TezChild 172.18.0.7 37065 container_e137_1526794512062_16404_01_000333 application_1526794512062_16404 1

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
]], TaskAttempt 3 failed, info=[Container container_e137_1526794512062_16404_01_000536 finished with diagnostics set to [Container failed, exitCode=-104. Container [pid=24184,containerID=container_e137_1526794512062_16404_01_000536] is running beyond physical memory limits. Current usage: 8.1 GB of 8 GB physical memory used; 12.2 GB of 40 GB virtual memory used. Killing container.
Dump of the process-tree for container_e137_1526794512062_16404_01_000536 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 24194 24184 24184 24184 (java) 3745 1471 13034442752 2134855 /usr/jdk64/jdk1.8.0_60/bin/java -server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.6.0-40 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -Xmx10240m -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/mnt/data/sdb/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000536 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/mnt/data/sdf/yarn/local/usercache/mlb/appcache/application_1526794512062_16404/container_e137_1526794512062_16404_01_000536/tmp org.apache.tez.runtime.task.TezChild 172.18.0.7 37065 container_e137_1526794512062_16404_01_000536 application_1526794512062_16404 1
|- 24184 24182 24184 24184 (bash) 0 0 108707840 612 /bin/bash -c /usr/jdk64/jdk1.8.0_60/bin/java -server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.6.0-40 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -Xmx10240m -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=/mnt/data/sdb/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000536 -Dtez.root.logger=INFO,CLA -Djava.io.tmpdir=/mnt/data/sdf/yarn/local/usercache/mlb/appcache/application_1526794512062_16404/container_e137_1526794512062_16404_01_000536/tmp org.apache.tez.runtime.task.TezChild 172.18.0.7 37065 container_e137_1526794512062_16404_01_000536 application_1526794512062_16404 1 1>/mnt/data/sdb/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000536/stdout 2>/mnt/data/sdb/yarn/log/application_1526794512062_16404/container_e137_1526794512062_16404_01_000536/stderr

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
]]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:1, Vertex vertex_1526794512062_16404_2_01 [Reducer 2] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0

BEST REGARDS!
Irving

Hi @irving-ccc

From the error message I gather that the Hive queries are being run using Tez on YARN. The above error message indicates that YARN killed some containers of your query because they used more memory than they were allowed to. I guess the first thing to try is to increase the YARN container memory for Tez. See:

https://community.hortonworks.com/questions/66803/hive-container-is-running-beyond-physical-limits.html

Best,
Björn

2 Likes

Hi@bjoern.lohrmann

  Bjoern : 

Your help really solved all my puzzles

Really appreciate.

Regards,

Irving

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.