CSV writer problem

Hello guys,

I have a trouble with CSV writer node.

I need to write really huge table to csv file.

1.8M rows and 130 columns.

I have 1TB SSD, 32GB RAM, i7 CPU, not bad machine, but it always go only to 29% and after that I wait for example one hour and it has no progress at all and I need to shutdown KNIME. It freeze.

Any idea why? Is table too big?

Thank you guys.

Jiri

Hello @sm0lda,

how many memory have you assigned to KNIME? At first table doesn’t seem big enough to present problems for KNIME and your machine. I have just successfully written out 900.000 rows and 100 columns on less power machine. Have you tried some other writer node for reference? For example Table Writer or new CSV Writer (Labs) node?

Br,
Ivan

Hallo @ipazin,

I tried CSV writer (labs) and have the same. 29% and stuck.

I have 24GB allocated for knime in .ini file -Xmx24576m

That should be enough or?

Any other idea?

It is really shame I finish my transformation within a few minutes, but at the end I can not write output file :slight_smile:

Hello @sm0lda,

Maybe too much in case you are running something else on your machine :smiley:

Weird to stuck at same percentage. Sounds like some data format issue and not memory related. Have you tried Table Writer? Also you can try to split your data into smaller parts and then use CSV Writer to see if there is any data format related problem which causes KNIME to freeze.

Br,
Ivan

1 Like

CSV writer can append data to an existing file. Have you maybe tried to do that in chunks?

1 Like

Hello @mlauber71,

I have identified portion of data causes troubles. If I excluded them csv writer is successful. But I do not know how to deal with those problematic data. It is circa about 80% from total…

well could you tell us more about this data (maybe even provide a sample without spelling secret informations). Are you able to write a small portion of this to a CSV file and how would that look.

Do you absolutely have to use CSV. Somtimes a format like Parquet or ORC might be better suited to handle complex files.

Hello @mlauber71,

thanks for your reply.

Sample file is attached.

test.txt (413.1 KB)

These lines are part of those problematic data. The file contains very few lines just for test purposes, but there was no problem with csv writer.

Thank you

Jiri

I found a hint of what is going on. On my Mac I could read and write your file without issue but I found one strange thing.

If you open the file in a editor you find these strange line breaks or so it seems. If you export this and read it back (with word) you find a strange little dot.

If you try to identify this with this

You get a small grey dot which results in

U+00B7 : MIDDLE DOT {midpoint (in typography); Georgian comma; Greek middle dot (ano teleia)}
U+200B : ZERO WIDTH SPACE [ZWSP]

So it seems you might have some strange characters in your data that some systems might struggle to process. You might have to investigate further of clean your data.

3 Likes

Hello!

Nice one @mlauber71 :+1:

Maybe try different encoding @sm0lda?

Br,
Ivan

1 Like

Hello guys,

thanks to all of you for a help. I played with encoding and all everything around and I was able to write more then before, but the problem is really huge amount of data. CSV file on output has at stage of 8% more 200GB so this is not a good way… I have to change my structure etc.

Thanks for your help!

Jiri

2 Likes

Hello @sm0lda,

glad to hear you made some progress. Well huge amounts of data do require proper storage like database or cloud…

Br,
Ivan

Hi @all,

i read the example file with using different applications including csv reader node. I cannot find any issue with it. Maybe there are specific OS-settings which are responsible for that issue.

BR

1 Like

Hi @ipazin,

DB is trouble due to my homeoffice and VPN which is not fast and stable enough.

It is impossible to push so huge amount of data through VPN to our DB.

Jiri

Hi @sm0lda,

in that case external hard disk might be solution.

Br,
Ivan

You could try to stream as much as possible of your workflow in order to lower the memory pressure.

Best
Mark

1 Like

If you employ big data techniques you could write the data out in chunks into CSV or parquet files and later access them with a Hive external table. I am not sure how KNIME would handle it if you would try to access a large number of such files with the local Big Data environment but in general big data techniques have been developed to deal with this scenario. The single files could be sent each at a time and at the end they would come together as one.

But I don’t know you setting concerning the (remote?) database.

Thanks all for a help. I have reorganized my flow, data model and now I dropped down with data volumy by 90%, so problem solved :slight_smile:

Thread could be closed.

Jiri

2 Likes

Hello @sm0lda,

wow! 90% not bad :+1:

You can mark any reply (including your own) as a solution and thread will be closed automatically 7 days after last reply :wink:

Br,
Ivan

1 Like

The whole idea and logic got changed and after I use compression CSV to QVD files for QlikView and from GBs I am on MBs, so it is probably more than 90%

Jiri

2 Likes