Training of all ML model stucks at 10%

asis4911 · October 8, 2021, 6:50am

Hi All,

I am a newbie at Knime, Please help me to sort out this problem I have created an NLP workflow where I am using TFIDF and then training a model I have tried all ML Models and every model got stucks at 10%. In heap status It shows memory full.

I have data where 7462 columns and 33591 rows

my laptop is 16 gb ram I have provided 14 gb to knime
and 500gb SSD with i5 10th generation,

This is my workflow, I have cleaned the data in jupyter in Knime I just wanted to train and test the model.

Is there any way that I can divide my training data into 5 or 10 small training set and then train my model one by one, Please help!!!

Daniel_Weikert · October 8, 2021, 2:25pm

hi
I can’t see any cleaning before using TFIDF. I would suggest removing some stopwords and puctuation upfront to reduce the vocab size (e.g. the columns)
The current 7462 * 33591 seems to be too big.
With the partitioning node you already split the data. So the training set is not the full dataset size.
Have you already tried running the models individually?

hope that gets you started in the right direction
br

asis4911 · October 14, 2021, 4:40am

Hi Daniel,

Thanks for your reply I already cleaned the data and removed stopwords and punctuations using python in jupyter notebook. After cleaning the data I export it into csv and then read it into Knime and after that its creating 7462 * 33591. Thats why I wanted to know because in Jupyter notebook training is not taking more than 10-20 seconds but here I can see its taking hours and then getting stuck and showing memory heap error.

ipazin · October 14, 2021, 2:00pm

Hello @asis4911,

and welcome to KNIME Community!

From your printscreen I see Gradient Boosted Trees Learner and Predictor nodes executed successfully, Random Forest Learner configured and Logistic Regression Learner not configured. I don’t really see any failing node. What I suggest is same what @Daniel_Weikert already suggested and that is to run models sequentially (connect them with flow variable connection) and not all three (or more) at the same time. Also you said you have given KNIME 14 GB of RAM out of 16 available. It’s not recommended to go to close to max as other processes on your machine require memory which can lead to issue you are experiencing. I would try 12. And if you share workflow (with data if not confidential) others can give it a try as well.

Br,
Ivan

TomB80 · January 10, 2022, 10:03am

Hi,

I face the same problem. It depends defenetly on the number of columns, not rows. With 2500col it stucks at 10%, with 1500col it runs. When it’s getting short of memory, the screen starts to freeze or the application crashes without warning (just disaprears). This is another issue. But the 10% freeze is no matter of memory. It even stucks at 10% with a memory usage of 20% or less.

I installed KNIME Column Storage (based on Apache Parquet) but don’t know how to use it (and if it would help)

So, any idea?

TomB80 · January 10, 2022, 10:07am

By the way, I found a way to skip this problem. Install the Weka 3.7 extention and use this classifiers. But they seem to run only on one core - no matter what I set. And I did not compare the prediction quality yet. And e.g. with MLP, what ever number of cycles I set, it seems not to affect the run time. Even all Classifiers have the same! So it’s suspicious to me at the moment.

TomB80 · January 10, 2022, 1:27pm

Problem solved. I added following lines into the knime.ini:

-Xss20M
-maxstack=20m

Greetings from Germany

system · July 12, 2022, 1:27am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.