the uploaded data exceeded the amount allowed by the Spark Jobserver error

#1

Hello, I want to upload my 10GB data into spark server by using ‘table to spark’. My spark server is ready, but I got such a error: Table to Spark 3:869 - ERROR: Execute failed: Request to Spark Jobserver failed, because the uploaded data exceeded the amount allowed by the Spark Jobserver. For instructions to change this please see the installation guide (https://www.knime.org/knime-spark-executor#install).

Then I just followed with link and change my configuration of KNIME server:
spray.can.server {
parsing {
max-content-length = 10G default was 200m
}
request-timeout = 60 s
idle-timeout = 120 s
request-chunk-aggregation-limit = 10G default was 200m
}

But I got another error when running ‘table to spark’:
Caused by: java.lang.RuntimeException: Config setting ‘request-chunk-aggregation-limit’ must not be larger than 2147483647

Then I change the ‘request-chunk-aggregation-limit’ into 2GB, but still get the error: Caused by: java.lang.RuntimeException: Config setting ‘request-chunk-aggregation-limit’ must not be larger than 2147483647

Of cource, I got the the uploaded data exceeded the amount allowed by the Spark Jobserver error if I use ‘request-chunk-aggregation-limit’ equal to default(200m)

Since my HDFS and other URL connections are not available, I can only use ‘table to spark’ node to process my data. Any suggestion on this problem?

0 Likes

#2

Hello DerekJin,

the maximum limit of the request-chunk-aggregation-limit is <2GB e.g. 1999m. However this won’t help you to upload 10GB in addition this will put a lot of strain on the driver node since the data is kept several times in memory on the machine where the Spark Job Server runs.
If you can upload data via ssh to a location which is accessible via the file protocol from the machine the Spark driver is running on you can upload the file there and then use the Spark DataFrame Java Snippet (Source) node to read it into Spark. I have created a small workflow that demonstrates this with a CSV file which you can download here. But you can also read any other file that is supported by the Spark Data Source API.

Bye
Tobias

2 Likes