String to Document node - so slow

I have a list of 2.5 million company names in which I need to remove common and custom stop words. It seems to do this I need to convert the list to documents. No problem, except it is taking about 14 hours. It's not a resource problem, I don't think. It's running on a 72 core machine with 512GB RAM.

 

Is there some workaround on this to speed it up somehow? Settings change maybe, or some other hack to create the documents?

Hey jimo42,

did you already try to increase the heapspace for KNIME?

https://www.knime.com/faq#q4_2

Otherwise you could also try the "String Replace (Dictionary)" node, but you need to provide an own dictionary.

Best wishes,

Julian

 

1 Like

Hi jimo42,

In addition to increasing you heap space like Julian said, you should also increase the number of cells kept in memory before data is written to disk, see here: https://www.knime.com/faq#q18.

The default is 100k, but with a system as massive as yours, you can certainly set a value considerably higher.

Cheers,

Roland

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.