Excel Reader straight to DB Loader Bug?

Apologies if I missed another thread on this.

When I connect an Excel Reader straight to a DB Loader (pointed to Hive via HDFS), as below, I get the following error:

ERROR DB Loader 2:2 eaderWrapper.java:83)
at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:71)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255)
… 26 more
], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: java.lang.IllegalArgumentException: start of message: expected ‘{’ but got ‘[Sheet1]’ at line 0: message SampleExcel.xlsx [Sheet1]
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.io.IOException: java.lang.IllegalArgumentException: start of message: expected ‘{’ but got ‘[Sheet1]’ at line 0: message SampleExcel.xlsx [Sheet1]

I believe the error is related to the hard brackets in the output port table name.

If I pass the table through a Column Filter (that filters no columns), the output port name changes and it can load succesfully:


Is there a setting in Excel Reader that can change the output port table name so I don’t have to pass it through the extra node?


1 Like

If this is a new table you must first create it and then load it. Like at the beginning of this example workflow.


Thanks for the suggestion - I had the table creator outside the snapshot. Here’s some slight restructuring to test, just in case:


I can provide a sample workflow, as it’s all on test data, if that helps.

1 Like

Hi @LeanderQuiring -

Sorry for the trouble. Please do post your example workflow, and I will run it by one of the developers ASAP.

1 Like

SampleExcel.xlsx (9.7 KB)
Excel Test - Clean.knwf (16.5 KB)

As an aside, I’m using the Cloudera driver but removed my connection details from the attached workflow:

Hello LeanderQuiring,
what cluster type and version are you connecting to e.g. CDH 6.0 or HDP3.0? I tried to reproduce the problem but couldn’t so far. Also, can you please send me the complete KNIME log file with the error message? I will send you a private message where you can reply with the log file.

1 Like

Sent! Thanks. Let me know if you can use anything else.

Hello Leander,
thanks for the log files. We could reproduce the problem and will fix it with the next KNIME release which is planned for December. The problem was that we use the name of the KNIME table in the MessageType of the Parquet file which is parsed by Tez. The name was invalid due to the special characters e.g. brackets. We now replace those special characters with ‘_’ which no longer causes any parsing errors.


This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.