Error in deep learning: ND4JIllegalStateException: Failed to allocate [810024960] bytes

Akshaykumar · March 22, 2018, 4:31pm

Greetings,

Because of help of you guys I could make Image Resizer work,
But unfortunately I still could not make my Deep Learning Network work because of this error
“ERROR DL4J Feedforward Learner (Classification) 0:14 Execute failed: org.nd4j.linalg.exception.ND4JIllegalStateException: Failed to allocate [810024960] bytes”

You see it starts to learn and says something about Backdrop Epoch but soon after some time it stops and gives this error.
It looks like as soon I overcome one obstacle another is just waiting there for me to discover.
That’s what makes it more fun and challenging though.

Also it learns till 0.89 epoch loss, what is optimum epoch loss.

With regards
Akshaykumar

nemad · March 22, 2018, 5:57pm

Hello Akshaykumar,

how large are the images you are using for training?
This error indicates that DL4J runs out of memory which is likely related to your input data size.

Greetings,

nemad

Akshaykumar · March 22, 2018, 6:32pm

Greetings,

I am using 256 x 256 x 30, 159 images of MRI. (2.9 GB of dicom images then converted into Nfti)
I think the data is pretty large.
Can I take help of cloud GPU.
Is there any node for cloud GPU.

With regards
Akshaykumar

Akshaykumar · March 22, 2018, 7:01pm

Greetings

@nemad
Thanks for the question.
It prompted me to use lesser image matrix.
But now I am thinking what is the ideal image matrix size.
I used 64 x 64 x 15 which is like a big step down from what I was previously using 256 x 256 x 30. (it ran with 64 one but did not ran with 128 or 256 matrix )
Please tell me the significance of the change in this matrix size.
I am worried that it may change the accuracy and results of my deep network.

With regards
Akshaykumar

nemad · March 23, 2018, 12:00pm

Hello Akshaykumar,

that is a very tricky question.
This depends on the kind of network you are using (especially it’s receptive field i.e. the volume of input it covers) and the kind of task you are trying to solve.
Anyway, you will have to make this trade-off unless you can move to a larger machine.
Regarding your cloud question: No we don’t have cloud GPU nodes but you can run KNIME on a cloud server that has a big GPU. Note that we currently don’t support multi-GPU training (unless you use the Python Learner node to orchestrate the training).

Cheers,

nemad

Akshaykumar · March 23, 2018, 7:35pm

Dear Nemad
Thank you.
I will try to find optimum matrix size.

With regards
Akshaykumar

system · June 2, 2023, 8:49pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.