Merging 4 Parquet files of 10GB and generating a Parquet file. Error message: Java heap space.

I’m having trouble merging 4 Parquet format files into a single Parquet file. These files together reach 10GB, or 177 million rows. I increased the knime.ini to 24GB of memory and I have 32GB of memory on my i7 notebook. What can I do to complete this task?

image

Stop in 17gb

I can see the row’s number but not the row with data.

@Aldemir to combine parquet files you could simply put them in a folder. Question is what you want to do with the data later

You can try a few things to boost performance. But it will depend on what you want to do.

2 Likes

I have a client who needs these parquet files together in a single parquet file. They will use it on PowerBI Desktop. You are absolutely right. I should have put the parquet files inside a folder and read them all together before writing the set.

Man, you’re “brabo,” as we say in Portuguese. You easily solved the problem. The solution was just to read the folder, simple as that. I increased the memory in the knime.ini file. I believe that also helped. Thank you very much!

image

1 Like

@Aldemir Congrats.
So how does your client now automate the updates (parquet data) for Power BI? Just curious
br

I also want to know about that. He needs to create an incremental update so that he doesn’t have to load the whole mountain every time he updates it.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.