Node: Strings To Document does not work

Node: Strings To Document does not work on your example. Error: GC overhead limit exceeded. I have used all your recommendations. Version 3.5.3. 64-bit
What should I do? Go to old versions?

Example is 009004_NYTimesRssFeedTagCloud

Hi @Vladimir_Savin,
i don’t know what you have tried before. If my input data is really big and i need to convert it to the document format i usually try to split it in smaller slices (Partitioning) and run one Strings to Document Node on each partition. But i connect the Strings to Document Node via Flow Variable so that only one Node can run at a time. This is time consuming but it is good if your memory is otherwise insufficient.
Another option is to use Loop Nodes. But from my experience you sometimes run into problems at the Loop End Node when it, tries to summarize the results.

I need Strings To Document node for Text Processing of MS Word, PDF. Without successful work of it I can not go to Bag of Words Creator.

Have you tried using the “PDF Parser” and/or the “Word Parser” Nodes? They read doc and pdf files and convert them to the document format straight away.

I used Tika node

I will try to implement MS Word parser. Thank you

Hey @Vladimir_Savin,

did you try to improve the performance of KNIME?
If not, please have a look at this blog post.

Cheers,

Julian

1 Like