spider
November 19, 2019, 1:06pm
1
Hi,
I do some aggregation in Apache Spark and write this data to parquet files. If I try to open it with KNIME 4.0.2 I get this error:
org.knime.bigdata.fileformats.utility.BigDataFileFormatException: Only Supported GroupType is LIST
Can you tell me what the problem is?
Hi @spider ,
sound like you use complex or unsupported data structures in spark/parquet. Do you have a sample parquet file or more details about your spark schema?
spider
November 21, 2019, 1:28pm
3
Here is the export of one Parquet file test.snappy.parquet.zip (981 Bytes)
with the content displayed with Apache Spark:
df = spark.read.load(‘/opt/data/bsp/’)
df.show(5)
±-------------------±------±----±-------------------±----------±---------±---------+
| window| Signal|count| std| avg| min| max|
±-------------------±------±----±-------------------±----------±---------±---------+
|[2019-11-19 13:15…|[32:26]| 5|0.039879337247094804|2.647222232|2.60648155|2.69444442|
±-------------------±------±----±-------------------±----------±---------±---------+
The window column is a struct and this is not supported by the KNIME parquet reader. You can explode the stuct using e.g. this snippet:
dataFrame1.select(col("window.start").as("win_start"), col("window.end").as("win_end"), col("*")).drop(col("window"));
or the Spark SQL node:
SELECT window.start AS win_start, window.end AS win_end...
Depending on your platform running spark, you need to select a INT96
to LocalDateTime
mapping in Type Mapping tab of the KNIME Parquet Reader . See the attached example workflow.
ParquetSparkWindow.knwf (22.1 KB)
1 Like
system
Closed
May 23, 2020, 3:18am
5
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.