parquet-reader errror: Only Supported GroupType is LIST

Hi,
I do some aggregation in Apache Spark and write this data to parquet files. If I try to open it with KNIME 4.0.2 I get this error:

org.knime.bigdata.fileformats.utility.BigDataFileFormatException: Only Supported GroupType is LIST

Can you tell me what the problem is?

Hi @spider,

sound like you use complex or unsupported data structures in spark/parquet. Do you have a sample parquet file or more details about your spark schema?

Here is the export of one Parquet file test.snappy.parquet.zip (981 Bytes)
with the content displayed with Apache Spark:

df = spark.read.load(‘/opt/data/bsp/’)
df.show(5)
±-------------------±------±----±-------------------±----------±---------±---------+
| window| Signal|count| std| avg| min| max|
±-------------------±------±----±-------------------±----------±---------±---------+
|[2019-11-19 13:15…|[32:26]| 5|0.039879337247094804|2.647222232|2.60648155|2.69444442|
±-------------------±------±----±-------------------±----------±---------±---------+

The window column is a struct and this is not supported by the KNIME parquet reader. You can explode the stuct using e.g. this snippet:

dataFrame1.select(col("window.start").as("win_start"), col("window.end").as("win_end"), col("*")).drop(col("window"));

or the Spark SQL node:

SELECT window.start AS win_start, window.end AS win_end...

Depending on your platform running spark, you need to select a INT96 to LocalDateTime mapping in Type Mapping tab of the KNIME Parquet Reader . See the attached example workflow.

ParquetSparkWindow.knwf (22.1 KB)

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.