I have 1TB SSD, 32GB RAM, i7 CPU, not bad machine, but it always go only to 29% and after that I wait for example one hour and it has no progress at all and I need to shutdown KNIME. It freeze.
how many memory have you assigned to KNIME? At first table doesn’t seem big enough to present problems for KNIME and your machine. I have just successfully written out 900.000 rows and 100 columns on less power machine. Have you tried some other writer node for reference? For example Table Writer or new CSV Writer (Labs) node?
Maybe too much in case you are running something else on your machine
Weird to stuck at same percentage. Sounds like some data format issue and not memory related. Have you tried Table Writer? Also you can try to split your data into smaller parts and then use CSV Writer to see if there is any data format related problem which causes KNIME to freeze.
I have identified portion of data causes troubles. If I excluded them csv writer is successful. But I do not know how to deal with those problematic data. It is circa about 80% from total…
well could you tell us more about this data (maybe even provide a sample without spelling secret informations). Are you able to write a small portion of this to a CSV file and how would that look.
Do you absolutely have to use CSV. Somtimes a format like Parquet or ORC might be better suited to handle complex files.
If you open the file in a editor you find these strange line breaks or so it seems. If you export this and read it back (with word) you find a strange little dot.
If you try to identify this with this
You get a small grey dot which results in
U+00B7 : MIDDLE DOT {midpoint (in typography); Georgian comma; Greek middle dot (ano teleia)}
U+200B : ZERO WIDTH SPACE [ZWSP]
So it seems you might have some strange characters in your data that some systems might struggle to process. You might have to investigate further of clean your data.
thanks to all of you for a help. I played with encoding and all everything around and I was able to write more then before, but the problem is really huge amount of data. CSV file on output has at stage of 8% more 200GB so this is not a good way… I have to change my structure etc.
i read the example file with using different applications including csv reader node. I cannot find any issue with it. Maybe there are specific OS-settings which are responsible for that issue.
If you employ big data techniques you could write the data out in chunks into CSV or parquet files and later access them with a Hive external table. I am not sure how KNIME would handle it if you would try to access a large number of such files with the local Big Data environment but in general big data techniques have been developed to deal with this scenario. The single files could be sent each at a time and at the end they would come together as one.
But I don’t know you setting concerning the (remote?) database.
The whole idea and logic got changed and after I use compression CSV to QVD files for QlikView and from GBs I am on MBs, so it is probably more than 90%