Increased Excel file size with KNIME 4.3.1

Hi,

Hope everyone is doing well!

I updated my KNIME to 4.3.1 recently. I started replacing legacy nodes with the new ones, and started from Excel Writer nodes. I really liked the new nodes.

I am collecting information from multiple places and combining all information in one Excel file using append under file and overwrite under sheet name options. The final file has 4 sheets;

  • sheet 1 has up to 10 rows
  • sheet 2 has up to 15 rows
  • sheet 3 has up to 50 rows
  • sheet 4 has 200k rows

Previously when I run my flow, it was producing this Excel file with the size of 38MB. Now it increased to 49MB, nearly 30% increase in the file size. However, if I open the Excel file and save it again (ctrl + s) without doing anything, the file size decreases to 38MB again.

Also; I started getting “Execution failed in Try-Catch block: Java heap space” error messages now. The -xmx value in the knime.ini was already set to 8g. I solved this issue by increasing the value to 12g but it doesn’t make sense to be honest when I am working with 200k rows only.

This was working fine with previous nodes. Is anybody else having the same issue? Maybe someone can give me some advice.

Thanks in advance.

Nezir

@ngonenc ,

thanks for your feedback and we’ll look into it.

The execution failed java heap space exception you’re experiencing occur in those workflows where you exchanged the Excel Writer node? If that’s the case, can you confirm that this only happens when you append to an existent excel file? Are you writing xls or xlsx files?

Regarding the size difference would it be possible to share the files with me? If so I’d send you a private message. Would make it easier for me to understand the underlying problem :slight_smile: .

Best
Mark

2 Likes

Hi @Mark_Ortmann ,

Thanks for your reply.

The full list of nodes that are changed;

  • Amazon Authentication
  • Amazon S3 Connector
  • CSV Reader (reads the text file that are stored in Amazon S3. Maybe that’s the culprit for file size)
  • Create Date&Time Range
  • Added 2 “String to Path” nodes for new Excel nodes
  • 3 Excel Reader nodes (xlsx)
  • 4 Excel Writer nodes (xlsx)

Those are all the changes I made so far. I run the flow this morning and after fixing above issues, I was able to produce the file correctly.

Unfortunately I can’t share the file since it has PII data on it. But I am adding the settings that I use for each Excel Writer nodes.

I’ll update my other flows later this week which don’t have PII data and keep an eye on the file sizes. If I monitor the same issue, I can share those with you for further analysis.

It’s just a wild guess, but previous writer nodes have “Memory Policy” tab which I set them for “Write tables to disc”. New writer nodes don’t have that option. Maybe that’s why it is using more memory?

Thanks,
Nezir

It is getting worse.

This morning I started getting “Execution failed in Try-Catch block: Java heap space” again on the last Excel writer node (which is the last step on the flow). I already assigned 12GB of RAM and doing nothing at the background while the flow is working. I can’t increase the memory more.
I tried running the node many times with no luck. The Excel node tries to append a table of ~200k rows and ~50 columns to an existing Excel file. As I monitored, the issue happens after reaching 75%.

However, I was able to produce the Excel file when I set it as a new file. But the file size went off the roof; reaching 82.5MB. And the file size decreased to 37.5MB after opening the file, saving without making any changes and closing it again. Those are the nodes I added to the flow recently;

  • 5 rule engine nodes
  • 1 joiner
  • 1 concatenate
  • 1 group by
  • 1 row filter
    and combined all these nodes in to metanode.

All their memory polices are set to “Write to disc”.

I hope KNIME can solve this issue swiftly.

Quick question. How many sheets, rows & columns does the file contain you’re appending to in your last step?

Best
Mark

Hi Mark,

There are 3 more sheets before appending the last sheet of 200k rows.

  • sheet 1= 3 rows, 17 columns (including header)
  • sheet 2= 2 rows, 22 columns (including header)
  • sheet 3= 55 rows, 11 columns (including header)

Before the last append, the file size was around 14kb, which is normal with those 3 sheets.

Thanks,
Nezir

And when appending your 200k table with 50 columns you run out of 12gb RAM … wow.

Could you tell me the types of your columns? Does you table contain anything, but String, Double, Int, Long and Boolean?

Best,
Mark

Hi,

It seems I set the Joiner with the wrong setting resulting more columns. I’ve fixed that and the file size reduced to 54.7MB. It is still higher than the normal though as when I open and save the file, the size reduced to 37MB again.

Here are the column types and counts. It seems the table has 65 columns, not 50 something. :slight_smile:

ColumnType count
String 51
Date and Time 7
Number (integer) 4
Number (long) 2
Number (double) 1

The new nodes changed previous Double types with Number (long).

Thanks,
Nezir

I made some tests to see if the problem is about the amount of rows or columns.

Again, using the same table of 200k rows; I did a row filtering and;

  • First 1000 rows, file size created 294kb, open-save file size decreased to 204kb
  • First 10000 rows, file size created 2.78MB, open-save file size decreased to 1.78MB

After that, I stop with my test with row filtering. I continue with column filtering

  • kept all 200k rows but removed all “Number (long)” columns; file size created 52.2MB, open-save file size decreased to 35.44MB
  • kept all 200k rows but removed all “String” columns; file size created 9.7MB, open-save file size decreased to 7.2MB
  • kept all 200k rows and removed all columns except for one “Number (long)” column; file size created 1.8MB, open-save file size increased to 2.1MB :slight_smile:

This concludes my test…

Nezir

1 Like

Hey @ngonenc ,

regarding the file size I’m still doing some digging, but it seems that Excel writes the files slightly different to the library we are using and other tools such as libre calc. Gonna figure out if there is a way to come up with an identical setup.

Regarding your memory issue. I have found a way to decrease the memory pressure, however if you’d append let’s say to that file you just wrote in your final step you’d again see a memory problem. I’m currently investigating if we have control over that problem as well, but I have my doubts. Gonna keep you posted and thanks for your awesome support!

Best,
Mark

3 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.