Flat File Document Parser problem.

Hi experts. I have a text file which size is 6.8 MB. After loading it to Flat File Document Parser and executing its start processing till 99% and after that it stuckes there for hours without any error. Any suggetions please.

Best wishes

Alam

Hi Alam,

can you share the txt file, zipped maybe?

Cheers, Kilian

Yes of course i can. 

Hi Alam,

it works for me. I can read the attached file, using the Flat File Reader. However, it takes around 5 Minutes to parse it. This is due to the underlying word and sentence tokenization which is applied during parsing. How much Xmx heap space have you assigned for KNIME? You can change the Xmx setting in the knime.ini file, which is in the directory where the knime binary is located. An increase of that Xmx number could speed up tokenization due to less GC.

Cheers, Kilian

Hi Kilian. I already changed my Xmx heap space to 2048. But its not working. I am also attaching my workflow maybe i am doing something wrong. And thanks a lot for your help :)

Regards

Alam 

Hmm I just tried your workflow and it works for me, also with 2GB Xmx settings. However in that workflow I am just reading the 1500.txt file (I don't have the others). I assume it is one of your files that causes the Flat File Reader some troubles. Can you put each of your files in a separate directory and read the files from these directories with different Flat File Readers. You can concatenate the data tables with the Concatenate node afterwards. Then you see which files can be parsed and which cause problems.

Cheers, Kilian

Thanks Kilian for reply. Actually other files are bigger than this even this one is not working then how could others work? I have Core2 laptop with 4 GB memory with Windows 7 OS. I dont know why its not working? I tested a small file like 2.7 kb and it worked.

Best Wishes

Alam

It is not necessarily the file size it is more the structure of the text that might cause problems. When the strings are conerted into documents tokenization (word and sentence) is applied. Therefore openNLP tokenizer model are used. If the text is not structured like natural language text, e.g. there are not periods in the text, the sentence tokenizer mode might have problems.

The file that you shared is working on my machine with just 2GB heap for KNIME. I assume the problem is caused by another file.

Cheers, Kilian

Hi Kilian. 

What version of Knime you are using?

Best wishes

Alam

I could read your file using 3.0.1. With 3.1 it is working as well.

Cheers, Kilian
 

Thanks a lot Kilian.

 

Best wishes

Alam