DL4J NPE: GPU Out of Memory?

Hiya, 

I was doing some saturday morning playing with CNNs this weekend. I got the celebrity recognition workflow to run using my cpu, nifty! I then switched to gpu mode and am having a problem [1]. It seems the node fails when trying to do a malloc when fitting the first row of data to the network. 

I found a possibly related issue on github that suggests a fix is available in master [2].  If I'm reading the githubs right it looks like this is fixed in dl4j 0.7.  I see knime-dl4j/master on github is using dl4j 0.8.  Is there an update site I can peek at to see if this issue is resolved in a more recent build?

Thanks in advance for the consideration and also for the awesome extension.  It's pretty cool!

Aaron 

ps - Tried this on a 770 gtx and 860m both with 2gb vram.  On both systems a simple mlp would run but alexnet would not.  

pps - Is there a preferred way to get % utilization on a gpu like using a task manager for windows?

[1] java.lang.NullPointerException
	at org.nd4j.jita.memory.impl.CudaDirectProvider.malloc(CudaDirectProvider.java:89)
	at org.nd4j.jita.memory.impl.CudaCachingZeroProvider.malloc(CudaCachingZeroProvider.java:116)
	at org.nd4j.jita.memory.impl.CudaFullCachingProvider.malloc(CudaFullCachingProvider.java:74)
	at org.nd4j.jita.handler.impl.CudaZeroHandler.alloc(CudaZeroHandler.java:253)
	at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:381)
	at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:338)
	at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.<init>(BaseCudaDataBuffer.java:144)
	at org.nd4j.linalg.jcublas.buffer.CudaFloatDataBuffer.<init>(CudaFloatDataBuffer.java:59)
	at org.nd4j.linalg.jcublas.buffer.factory.CudaDataBufferFactory.createFloat(CudaDataBufferFactory.java:251)
	at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1277)
	at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:262)
	at org.nd4j.linalg.jcublas.JCublasNDArray.<init>(JCublasNDArray.java:114)
	at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.createUninitialized(JCublasNDArrayFactory.java:229)
	at org.nd4j.linalg.factory.Nd4j.createUninitialized(Nd4j.java:4391)
	at org.nd4j.linalg.api.shape.Shape.toOffsetZeroCopyHelper(Shape.java:152)
	at org.nd4j.linalg.api.shape.Shape.toOffsetZeroCopy(Shape.java:108)
	at org.nd4j.linalg.api.ndarray.BaseNDArray.dup(BaseNDArray.java:1498)
	at org.nd4j.linalg.jcublas.JCublasNDArray.dup(JCublasNDArray.java:407)
	at org.nd4j.linalg.api.ndarray.BaseNDArray.mul(BaseNDArray.java:2554)
	at org.deeplearning4j.nn.layers.normalization.LocalResponseNormalization.activate(LocalResponseNormalization.java:197)
	at org.deeplearning4j.nn.layers.BaseLayer.activate(BaseLayer.java:385)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.activationFromPrevLayer(MultiLayerNetwork.java:552)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.feedForwardToLayer(MultiLayerNetwork.java:675)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:1820)
	at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:152)
	at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:54)
	at org.deeplearning4j.optimize.Solver.optimize(Solver.java:51)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1453)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1490)
	at org.knime.ext.dl4j.base.nodes.learn.AbstractDLLearnerNodeModel.backpropOneEpoch(AbstractDLLearnerNodeModel.java:307)
	at org.knime.ext.dl4j.base.nodes.learn.feedforward.classification.FeedforwardClassificationLearnerNodeModel.trainNetwork(FeedforwardClassificationLearnerNodeModel.java:367)
	at org.knime.ext.dl4j.base.nodes.learn.feedforward.classification.FeedforwardClassificationLearnerNodeModel.execute(FeedforwardClassificationLearnerNodeModel.java:186)

[2]  https://github.com/deeplearning4j/nd4j/issues/1335

Hi Aaron,

thanks for reporting this issue. Could you maybe provide me with the full knime log file containing the error? We recently upgraded to 0.8.0 internally, but its not released yet. Unfortunately, there is no public update site to try it out but I will have a look.

In order to look at GPU utilization I'm using the tool GPU-Z on Windows. For Linux there is nvidia-smi.

Cheers

David

 

Thanks for looking into it, here is the log file. 

Best,
Aaron

Thanks for the log. Unfortunately, I was hoping for more DL4J error messages. I tried the mentioned workflow on my GTX 1080 and it worked fine.

A wild guess, but have you tried lowering the batch size? Maybe that helps.

I did some further investigation.  It appears that there is an issue in the GPU caching of the mini-batch data.

While monitoring the memory usage on your card, if you repeatedly execute the learner with different (large) batch sizes, you will eventually fill up your vram. Once this happens, you get the NPE listed above.  It is possible to recover by restarting KNIME. I believe this may be related to the issue below. 

Cheers,

Aaron

https://github.com/deeplearning4j/deeplearning4j/issues/2477