The file is too big to be read by knime, is there any solution? Or measures

mlauber71 · October 12, 2021, 6:55pm

@lichang one thing you could try is use a local big data environment and copy the TSV file into the environment and use an external table. In theory Hive should be able to handle very large files. It might be possible to load and then convert / compress the file.

It might take some preparations to do that:

Do you have a small sample of the file that would have the very same structure as your large file and the same encoding?

The next thing to try would be to read the data in chunks and tell the CSV Reader to skip to line number n and only read x lines until the whole file has been imported. I am not sure if the reader internally would try to cache the whole thing or if it could handle such a task. The example is reading from a database but the loop might be adapted:

And then you could see if the simple file reader or file reader could handle the data.

In all these cases you might want to consider that once you have the data in a format that you can read and KNIME might be able to handle you will have to have enough resources to process the file. So you might have to use chunks of the data at once. Here are som additional hints about KNIME and performance: