I followed the webinar Deep Learning for Image Analysis and tried to run the workflows for the Image Captioning on my Virtualbox environment with Ubuntu 20.04 using Knime 4.1.3 with all the Deep learning setup things for Python.
The workflows 1 till 3 went okay, but the fourth one ran after a number of hours into a disc space issue. At this point I asked one of the presenters of the webinar, Benjamin Wilhelm (@bwilhelm), for assistance.
The next is a summary of what happened since.
The workflow should normally be able to execute with 10GB of discspace, but with over 60GB available at the start of the execution it still required more on my Ubuntu VBox as was shown in the console
*** Welcome to KNIME Analytics Platform v4.1.3.v202005121100 ***
*** Copyright by KNIME AG, Zurich, Switzerland ***
Log file is located at: /home/jan/knime-workspace/.metadata/knime/knime.log
WARN FontStore Using the system default font for annotations: Font {139676975991072}
WARN Python Script (1ā1) 0:11 :38: UserWarning: The following words could not be found in the GLOVE dictionary.
WARN Python Script (1ā1) 0:11 :39: UserWarning: [āselfieā, āendseqā, āfrizbeeā, āfrisbeā, āstartseqā, āsandwhichā]
WARN GroupBy 2:90:21 No grouping column included. Aggregate complete table.
WARN GroupBy 2:90:21 No grouping column included. Aggregate complete table.
WARN GroupBy 2:90:21 No grouping column included. Aggregate complete table.
WARN Keras Network Learner 2:65 The number of rows of the input training data table (293312) is not a multiple of the selected training batch size (100). Thus, the last batch of each epoch will continue at the beginning of the training data table after reaching its end. You can avoid that by adjusting the number of rows of the table or the batch size if desired.
WARN Keras Network Learner 2:65 /home/jan/anaconda3/envs/py3_knime_dl/lib/python3.6/site-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was not compiled. Compile it manually.
ERROR Buffer Writing of table to file encountered error: IOException: No space left on device
ERROR Buffer Table will be held in memory until node is cleared.
ERROR Buffer Workflow canāt be saved in this state.
ERROR Buffer Writing of table to file encountered error: IOException: The partition of the temp file ā/tmp/knime_container_20200531_6478936927834789571.bin.snappyā is too low on disc space (0MB available but at least 104857600MB are required). You can tweak the limit by changing the āorg.knime.container.minspace.tempā java property.
As suggested I added a line ā-Dorg.knime.container.minspace.temp=Xā (with X being the size in Mb) to my knime.ini, but this did not change anything.
What happened during the execution of the Keras Network Learner node was it creates during the first shuffling of data (thatās what is shown as text hovering above the progress-bar) a file of 3.6GB called knime_container_(a large number).bin.snappy in the temp-directory of the workflow.
At the end of each epoch there is again shuffling of data which creates a new container-file, but with another big number and in the parent directory of the tempdir of the workflow (so directly in /tmp).
During this creation of the container file one can see a Knime_DuplicateChecker⦠file being created and deleted, but this bin.snappy file is not deleted.
So to complete the full 30 epochs would required 108GB (at least) of disc space.
Benjamin is looking into this. He can reproduce this on Ubuntu 18.04 as well.
It might be related to the automatic memory management system for tables, as he mentioned to me.
On his request I added the issue to the forum, so others can see the discussion too. If you know a solution for this, feel free to join this thread.