How can I keep output data in csv reader?

Hello.

I read the csv file with 100GB using csv reader node.

It’s a huge file, so it takes a long time.

But I change some configuration and roll back by ctrl+z, I need to execute csv reader node again.

How can I keep output data in csv reader?

Thanks.

Hi @hhkim,

This may sound simple and I understand not always possible dependant on what values are in your data but why not just build your workflow based on a sample, get it right then apply to the whole dataset at the end?

The option is in the Limit Rows tab of the CSV Reader configuration window.

Thanks,

Matt

5 Likes

@hhkim it might not help you in this case. But with the Parquet file format and the Parquet writer you could write out large data in chunks of your own choosing (and would have a fast compression with “snappy” at hand also).

You could (and should) change the values about the size of the chunks. The 2 MB is just for the example:

The data would then look like this. The single files could be used as individual files and they would have the additional benefit of storing the column types and have compression:

image

You could later read the files back in one single step:

This example is about R and Parquet but you can just use the part about Parquet with folders:

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.