Flat File Document Parser problem.

alamsaqib · November 26, 2015, 5:19am

Hi experts. I have a text file which size is 6.8 MB. After loading it to Flat File Document Parser and executing its start processing till 99% and after that it stuckes there for hours without any error. Any suggetions please.

Best wishes

Alam

kilian.thiel · November 26, 2015, 7:18pm

Hi Alam,

can you share the txt file, zipped maybe?

Cheers, Kilian

alamsaqib · November 27, 2015, 10:59am

Yes of course i can.

1500.zip

kilian.thiel · November 27, 2015, 12:05pm

Hi Alam,

it works for me. I can read the attached file, using the Flat File Reader. However, it takes around 5 Minutes to parse it. This is due to the underlying word and sentence tokenization which is applied during parsing. How much Xmx heap space have you assigned for KNIME? You can change the Xmx setting in the knime.ini file, which is in the directory where the knime binary is located. An increase of that Xmx number could speed up tokenization due to less GC.

Cheers, Kilian

alamsaqib · November 27, 2015, 3:47pm

Hi Kilian. I already changed my Xmx heap space to 2048. But its not working. I am also attaching my workflow maybe i am doing something wrong. And thanks a lot for your help :)

Regards

Alam

knime_project_test.zip

kilian.thiel · December 2, 2015, 9:55am

Hmm I just tried your workflow and it works for me, also with 2GB Xmx settings. However in that workflow I am just reading the 1500.txt file (I don't have the others). I assume it is one of your files that causes the Flat File Reader some troubles. Can you put each of your files in a separate directory and read the files from these directories with different Flat File Readers. You can concatenate the data tables with the Concatenate node afterwards. Then you see which files can be parsed and which cause problems.

Cheers, Kilian

alamsaqib · December 3, 2015, 2:04pm

Thanks Kilian for reply. Actually other files are bigger than this even this one is not working then how could others work? I have Core2 laptop with 4 GB memory with Windows 7 OS. I dont know why its not working? I tested a small file like 2.7 kb and it worked.

Best Wishes

Alam

kilian.thiel · December 4, 2015, 10:15am

It is not necessarily the file size it is more the structure of the text that might cause problems. When the strings are conerted into documents tokenization (word and sentence) is applied. Therefore openNLP tokenizer model are used. If the text is not structured like natural language text, e.g. there are not periods in the text, the sentence tokenizer mode might have problems.

The file that you shared is working on my machine with just 2GB heap for KNIME. I assume the problem is caused by another file.

Cheers, Kilian

alamsaqib · December 6, 2015, 5:47am

Hi Kilian.

What version of Knime you are using?

Best wishes

Alam

kilian.thiel · December 7, 2015, 9:03am

I could read your file using 3.0.1. With 3.1 it is working as well.

Cheers, Kilian

alamsaqib · December 7, 2015, 1:07pm

Thanks a lot Kilian.

Best wishes

Alam

system · June 2, 2023, 9:48pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.