Performance in input

Hey guys!

It’s taking a long time to load TXT file. I’m working with txt files containing thousands of information .txt files with more than 40 thousand lines. Files with a size of approximately 3.52GB.

I need to input the file with the File Reader node (Complex Format) because I cannot consider the delimiter as it is an unstructured file in its information (it is a fiscal file).

I’ve already configured the knime .config file according to some tips, however it is taking almost three minutes to load the information on the node and there are even larger files that take longer. Who can help to say what can be done to have more performance?

All suggestions will be valid to evaluate and try to improve.
Thanks.

Hi @andymesmo , data size is never about how many lines you have alone. In general, 40k lines of records should be nothing, but the file size puts that in perspective.

It can indeed take some time to open a txt file of over 3.5GB. 3 mins is not so bad. At this point, it has to do with the resources that you have. Consider assigning more memory to Knime, and/or increase your system’s RAM and CPU.

2 Likes

You can convert File Reader to component and stream it. It may work 1.5 - 2 times faster.

1 Like

Thanks @bruno29a

I configured it before asking for help, and put 9GB of RAM memory. The equipment has 16B of Ram and a Core i5-7300 processor. But if the reading of this node for data input is within the standard, thank you.

1 Like

Thank you @izaychik63

I will test it that way too.

Hi @andymesmo and @izaychik63 , in terms of streaming, there is no real advantage in streaming only the reading of the file. The purpose of streaming is that you can partially read some data, and work on that partially read data (manipulation, writing to some destination) while the read continues.

If you need the whole file to be read before you can continue to the next node, it means that you will still need to wait for the file to be fully read, which defeats the purpose of streaming.

EDIT: I would say, do try it though, just for experimenting

1 Like

@bruno29a, Just try to bring your attention to the trail below.

1 Like

Hi @izaychik63 , from the thread you shared: “streaming helps only in case of multiple chained nodes with streaming functionality (not case here)”, that’s exactly the point I was making, at least that’s what I believe also.

It seems from that thread that you tried and it worked for you. As I recommended to @andymesmo , do try it to see if there is huge gain.

1 Like

Have you enabled the columnar backend? I am not an expert on all of its benefits but would be interesting whether it can help here too
br

1 Like

hello @Daniel_Weikert

all help is welcome and adds knowledge too.

I don’t know about this feature.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.