Large File CSV Reader Error

Hi. I’m asking because I’m getting an error when reading a large CSV file.

I read about 20 million csv files in my local environment with the ‘CSV Reader’ node and deployed them to KNIME Server through the ‘CSV Writer’ node.

This is not a problem, but when I reload the csv file deployed to the server with the ‘CSV Reader’ node, I get the following error message.

Can you please let me know what could be the problem?

KNIME AP version is 4.7.1.

Thanks.

Hi,
Can you try downloading the CSV file from the server using the REST API? For that, you can access your file with the following URL: https://yourhostname.com:8080/knime/rest/v4/repository/path/to/file:data. yourhostname, maybe the port 8080, and /path/to/file of course need to be adapted. If you use this URL in your browser after having logged into the WebPortal, it should download your file. If that fails, the error is on the server, otherwise it is the CSV node.
Kind regards,
Alexander

Thank you for your response.

Besides downloading the dataset via the URL in the Webportal, I only want to connect to the Mountpoint on the server with a CSV Reader node to get the loaded data.

In the process, I got an error on the CSV Reader node… Do I need to set any special options to import large data?

Thank you.

Hi,
You should not need to. The “Premature EOF” means that KNIME has received an “end of file” control character that it did not expect. This can originate in the server, some network component in-between, or maybe even due to a bug in the AP. By testing it in the browser we can check if the AP is at fault at all. It’s not too likely, but let’s get this out of the way first before looking at the other components.
Kind regards,
Alexander

Hi,
Another thing: If the download fails as well, you may want to increase the memory your KNIME Server’s Apache Tomcat has available. You can do that in the Tomcat directory under /bin/setenv.sh. There you’ll see a line like:

export CATALINA_OPTS="-Xmx2048M -server -Dsun.jnu.encoding=UTF-8"

Change the -Xmx to something higher, e.g. 4096 instead of 2048 and it may help with your issue.
Kind regards,
Alexander

Thank you for your kind response.

I also get the above error when blowing large data using CSV Read node in Local, what resources should I increase in such cases?

Thank you.

Hi @JaeHwanChoi,
Are you reading the file from KNIME Server or some other source into your local AP? Your Analytics Platform also has an Xmx option, but generally, a value that is too low results in Heap Space errors, not premature EOF. The latter occurs mostly when the server connection is lost. I would say that it is more likely that this is a network or server issue than a KNIME AP issue. I am not aware of any settings you could change to prevent this, except network settings such as the proxy. In an enterprise setting I have seen very weird things, where a proxy would only cap connections to certain IP ranges and whenever someone sent a file of a certain size there, an error would occur. These things are very hard to debug.
Kind regards,
Alexander

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.