RAM Error with H2O AutoML Extension in KNIME During Regression Task

inirem · October 3, 2024, 9:48am

Description:
I’m experiencing a RAM-related issue when using the H2O AutoML extension in KNIME during a regression task. The error occurs after modifying my dataset. Below are the details:

Current Workflow:

I’m reading data from two different CSV files, performing operations on them, and joining them. One intensive operation involves adding a feature based on previous rows.
The final dataset contains approximately 60,000 rows and 300 columns (previously 60,000 rows and 30 columns).
My KNIME installation is set to use up to 28 GB of RAM. This worked fine for previous regression runs, but the issue arose after I expanded the dataset (by adding data from up to 295 previous rows, whereas I previously only added data from 50 rows).

Problem:

After modifying the dataset, the learner node crashes with a Java Heap Memory error.
When using a column filter to reduce the dataset size before the H2O table creation, the learner node no longer crashes, but gets stuck at 2%.
I tried enabling “Write to disk” both before the column filter and inside the Learner-Node. This helped reduce RAM usage, but the learner node still gets stuck at 2%.

Attempts to Fix:

Moved nodes into a new workflow to minimize RAM usage—didn’t solve the issue.
Applied a column filter to remove the newly added columns—prevented crashing, but learner still stuck at 2%.
Enabled “Write to disk”—reduced RAM usage but didn’t resolve the learner node being stuck.

Other Observation:
Whenever the learner node gets stuck or KNIME crashes, it happens right after the memory is dumped. This behaviour was the same BEFORE activating “write to disk”. I think the issue has obviously to do with some memory error. After this the RAM-Ussage stays below 16 GB yet the CPU-Usage is still high, but with many notable spikes. Usually when running the Learner-Node is was persistant at 100%. The KNIME application itself is barely responding to interaction.

Question:
Why does the learner node still seem affected by the new columns, even though I’ve filtered them out before loading the data into the H2O environment? What can I do to fix the issue?

mlauber71 · October 3, 2024, 10:23am

@inirem Save the prepared data as .table and do the model calculation in a fresh workflow without anything else running. Maybe just start with a few lines to see if it does run at all or if there are other issues.

In general dimension reduction is always a thing. You can try and eliminate columns that would not contribute much.

inirem · October 3, 2024, 11:58am

@mlauber71 Thanks a lot for the time taken to help me. Sadly the tipps didn’t fix the problem. The behaviour stays exactly the same.

Which somehow makes sense to me. Because I don’t think that previous nodes really effect the AutoML-Node to much. Or with other words: Once using the Table Reader basically “the same” data is loaded into RAM again. Or am I wrong here?

Any other Ideas - besides dimension reduction? Btw. thanks for the link to the other issue. Very informative.

mlauber71 · October 3, 2024, 12:08pm

@inirem as I said. Trying with few lines and see if it does run at all. Also trying another H2O node to see if it does work. Then you could try and provide a fresh log file in debug mode.

Also from a time when there was no dedicated auto ml node you can try and run the whole thing in python in the background.

inirem · October 3, 2024, 12:47pm

I apologize for not thoroughly reviewing your previous response before asking.

At the moment, I am running a test using only 30% of the dataset. So far, it seems to be working fine, but I’ll have more details in about an hour.

However, while reducing the dataset size or dimensionality is important, simply using less data can’t always be the solution in machine learning scenarios. There must be another approach besides upgrading RAM or reducing the dataset size, right? After all, isn’t the “Write to Disk” option supposed to help with this exact kind of problem?

Edit: Also I still dont understand why having something loaded in the previouse node leads to a failure in the Learner-Node. Especially since my RAM usage was not high at that point. Could this be a bug in memory handling?

mlauber71 · October 3, 2024, 5:50pm

@inirem you can try and edit the configurations of the auto ml for example reducing the time to run or exclude some algorithms like deep learning.

system · January 1, 2025, 5:50pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.