some strange behaviour using big data extensions and spark executor on Cloudera

Hi all,

When passing from hive to spark with the dedicated component you have to write in the component placed before the databasename as a prefix with the table name :

SELECT * FROM pov.exercice_spark and not SELECT * FROM exercice_spark

even if the "database" is declared in the hive connection.

The drop existing table option in the hive loader doesn't work as well as the "drop table node" giving no warning at all. It could be an authorization problem BUT writing something like drop table if exists pov.exercice_spark in the "Database SQL executor" works very well.

Best regards

Fabien

Hi Fabien,

yes, unfortunately the "Hive to Spark" node currently does not use the Hive Connector information and only executes the SQL statement. We are aware of that limitation, and it is on our list of things to address for the next release in summer. For the meantime, I would kindly recommend to use the workaround of specifying the database name.

As far as the "drop existing table" option in the hive loader is concerned, that is indeed strange. Could you provide me with a sample workflow (so I can reproduce it) and your KNIME log file? I will send you my email address for that.

Best regards,

Björn

Hi Bjoern,

I do think there is no mystery under this. It appeared to me that with certain nodes droping worked and not with the others. I do think now that this is rather due to problems with cluster unstability relative to hive metastore. Correlation is not causality ;o)

Best regards,

Hi Bjoern,

well I tested it on one cloudera real cluster and one VM hortonworks to be sure.

That fact is when you use the drop existing table option it doesn't drop it at all.

If you uncheck the option, it doesn't add the datas to the already existing table but throw the error :

ERROR Hive Loader 3:294:291 Execute failed: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Table test_chargement3 already exists)

best regards,

Fabien

Hi all,

I have similar problem as fabienc above. Trying to load the data to Hive (real clodera cluster) using Hive loader node. I get an error below no matter if droptable is checked or unchecked.

Execute failed: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

Table on hive is always created however data is not loaded. I am not trying to load any big files so far. Only trying to test load of .csv file with couple of records of integer data type.

Any suggestions?

Eeve