I/O problem

baj · June 30, 2010, 4:04pm

Hello,

I need some advice on how to debug an I/O problem.
I get the following error messages and exceptions from the console and the knime.log file:

ERROR Buffer CODING PROBLEM Writing cells to temporary buffer must not throw NullPointerException
ERROR KNIME-TableIO-1 Buffer CODING PROBLEM Writing cells to temporary buffer must not throw NullPointerException

java.lang.NullPointerException
at org.knime.core.data.container.Buffer.addRow(Buffer.java:565)
at org.knime.core.data.container.DataContainer.addRowToTableWrite(DataContainer.java:502)
at org.knime.core.data.container.DataContainer.access$4(DataContainer.java:453)
at org.knime.core.data.container.DataContainer$ASyncWriteCallable.call(DataContainer.java:1230)
at org.knime.core.data.container.DataContainer$ASyncWriteCallable.call(DataContainer.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

I cannot see any problem on our hard drives, i.e. there is still enough space. It might be though that Eclipse already cleaned up the directories…

How could I test for this?

I am running more than one KNIME_Batch jobs at the same time (some in the same directory). How can change the tmp directory for the batch jobs, and how can I change the display of the debug/warning/error… level for these batch executions?

Thanks for your kind help…
Best,
Bernd

baj · July 1, 2010, 9:56am

Is it possible to change the way the tmp directory is handled?
I would prefer to have directories for each workflow and then for each node within that temporary directory. It seems one problem I am running into is the limitation of our file system to handle only about 30,000 files. Large number of files make also the FS slow…

Could you please comment:
a) how likely is it that you will implement this?
b) when could you do this? (it is of vital importance to me)
c) in case you won’t be able to do this in a timely fashion, I would have to do this. This means that I will have to create my knime.core branch.
cA) could you support me in this endevour?
cB) what do you suggest to do?

Thanks a lot,

Bernd

baj · July 1, 2010, 10:21am

One more thing to clarify: This happens e.g. when I try to sort 100s of Mio of rows in parallel workflows even executing those workflows in batch mode on different dataset…

B

wiswedel · July 1, 2010, 10:26am

Hi Bernd,

Can you reproduce this and is it deterministic? Reading the code I wouldn’t say it’s a problem with the file system. The problematic line in class Buffer is : “m_list.clear();” – so obviously m_list must be null. However, just two lines above it this list is iterated and that does not cause a NPE. There is a clear() method that is called when the node is reset that sets the member to null.

Question: Are custom nodes involved in this? Are you using loops (nodes in loops are often clear()ed). Can you add a line “-Dknime.synchronous.io=false” to your knime.ini, reproduce the problem and send us the stack trace once again.

Changing the log level in the batch executor can be done by editing the <batch_workspace>/.metadata/knime/log4j.xml file.

Regards,
Bernd

baj · July 1, 2010, 10:50am

Ok, I guess I have too many comments/thoughts. So one after the other:

the file <batch_workspace>/.metadata/knime/log4j.xml is being created AFTER I start the batch job.
Should I create it in advance? Where is it copied from, or generated from?
I strongly believe that the number of files per directory is the limiting factor here. It takes a few hours to produce so many files and then they are removed on the fly if there is an error… I currently have about 11,000 files in the tmp directory and the sort nodes are only done to 33% and 23%.
(Please see my second/third comments)

to answer your question: I don’t think so, but I cannot be certain. I am having serious problems opening a saved workflow (from a batch execution).

I will try the synchronous.io, but as said earlier it take a lot of time and basically the problem mostly/only occurs in the procution/batch version making it even more difficult to debug…

Thx
B

wiswedel · July 1, 2010, 11:15am

Hi, 1) The newly created folder should be re-used if started a second time. Otherwise you can point it to an existing location using the -data option 2) Do you suspect that the sorter node causes the trouble? It produces many temporary files (which will change in the next minor release – fixing the problem reported here) Btw, I had a typo in my previous post: Make it -Dknime.synchronous.io=true (not false!). Thanks, Bernd

baj · July 1, 2010, 12:42pm

Thanks for the comments. Right now it looks like it could also be the space in the tmp directory…

could you please elaborate on the-data option? I couldn’t find any reference to its use?

Also,

How can I set the tmp directory for a batch execution?

Thanks,

Bernd

wiswedel · July 2, 2010, 5:52pm

The “-data” option is a eclipse built-in switch. It’s the command line substitute for the prompt that usually comes up when you launch the eclipse GUI (which you don’t have when you run the batch executor). The argument to “-data” is the workspace location – that contains some configuration files (such as the log4j.xml and the preference that you set in the eclipse preference editor) and – usually – also the projects (but not for the batch executor.) The temp location of a batch job can be changed using the -Dknime.tmpdir=new_valid_path property (again, this would need to be added to knime.ini file.) And in case you are going to ask where this magic commands are defined: There is a list of java properties defined in the class KNIMEContants, some of which have a prefix PROPERTY – these are the ones that change the environment, including temp dir location, synchronous I/O, etc.

baj · July 9, 2010, 10:34am

THANKS A LOT!!!
I believe I am getting closer to a working installation on the different computers…

Bernd