file import problem

I have been importing gb sized text files delimited by comma, | or other custom delimiters. csv/file reader nodes are fickle and I usualy have problems. But line reader always works. So I am a bit mystified why the same files are no longer read by even line reader.

All 3 nodes throw ERROR     Line Reader     Execute failed: IOException while checking for duplicate row IDs
 

MOst tables exported from a 2.6 version of knime for these files can't be used as they throw ERROR     Table Reader     Execute failed: Unable to parse xml: line=-1: Premature end of file.
xml: URI=java.io.BufferedInputStream@72771b54
dtd: URI=nul.

These tables are not null. There was an issue with lack of space in /tmp but I changed the tmp folder to one on my home dir.

 

For the curious this data is from stitch.embl.de

Hello, 

I'd like to take a look at this but haven't used stitch before.  Is there a quick way I can generate one of these large files?

 

Regards,

 

Aaron

Unfortunately no. I havent got the bandwidth to upload these large files. Let me elaborate on what I did.

Use proftp server /client to ftp all files to a windows machine. This included the whole knime folder and subfolders and all many other folders on my linux desktop. Replaced earlier ubuntu version which was a bit messed up with fedora18 and on another laptop with sabayon 13.04.

Used filezilla to transfer all prev files from windows 7 lappy to both these machines. Used chmod a+x ./ inside knime folder .

Ran knime without issues for small files. Bu both linux machines give problems when reading the large files that could be read earlier with write to disk option. Not too worried about .table files but "duplicate row IDs" is bothersome.

I will try to attach the log file later.

 

Can you send us the knime.log with the error messages and stacktraces in it?

Hi thor,

 

Here's the relevant section of the knime log file.

 

2013-06-11 09:39:41,163 DEBUG main NodeContainerEditPart : File Reader 0:1 (EXECUTING)
2013-06-11 09:51:47,703 DEBUG KNIME-Worker-0 File Reader : reset
2013-06-11 09:51:48,084 DEBUG KNIME-Worker-0 File Reader : clean output ports.
2013-06-11 09:51:48,085 ERROR KNIME-Worker-0 File Reader : Execute failed: IOException while checking for duplicate row IDs
2013-06-11 09:51:48,085 DEBUG KNIME-Worker-0 File Reader : Execute failed: IOException while checking for duplicate row IDs
org.knime.core.data.container.DataContainerException: IOException while checking for duplicate row IDs
    at org.knime.core.data.container.DataContainer.checkAsyncWriteThrowable(DataContainer.java:579)
    at org.knime.core.data.container.DataContainer.offerToAsynchronousQueue(DataContainer.java:745)
    at org.knime.core.data.container.DataContainer.addRowToTable(DataContainer.java:841)
    at org.knime.base.node.io.filereader.FileReaderNodeModel.execute(FileReaderNodeModel.java:210)
    at org.knime.core.node.NodeModel.execute(NodeModel.java:680)
    at org.knime.core.node.NodeModel.executeModel(NodeModel.java:536)
    at org.knime.core.node.Node.invokeNodeModelExecute(Node.java:1000)
    at org.knime.core.node.Node.execute(Node.java:894)
    at org.knime.core.node.workflow.SingleNodeContainer.performExecuteNode(SingleNodeContainer.java:895)
    at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:100)
    at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:166)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:124)
    at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:239)
Caused by: org.knime.core.data.container.DataContainerException: IOException while checking for duplicate row IDs
    at org.knime.core.data.container.DataContainer.addRowKeyForDuplicateCheck(DataContainer.java:886)
    at org.knime.core.data.container.DataContainer.addRowToTableWrite(DataContainer.java:560)
    at org.knime.core.data.container.DataContainer.access$4(DataContainer.java:512)
    at org.knime.core.data.container.DataContainer$ASyncWriteCallable.call(DataContainer.java:1330)
    at org.knime.core.data.container.DataContainer$ASyncWriteCallable.call(DataContainer.java:1)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: No space left on device
    at java.io.FileOutputStream.writeBytes(Native Method)
    at java.io.FileOutputStream.write(FileOutputStream.java:318)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
    at java.io.DataOutputStream.write(DataOutputStream.java:107)
    at java.io.DataOutputStream.writeUTF(DataOutputStream.java:401)
    at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)
    at org.knime.core.util.DuplicateChecker$Chunk.addKeys(DuplicateChecker.java:106)
    at org.knime.core.util.DuplicateChecker.writeChunk(DuplicateChecker.java:388)
    at org.knime.core.util.DuplicateChecker.addKey(DuplicateChecker.java:271)
    at org.knime.core.data.container.DataContainer.addRowKeyForDuplicateCheck(DataContainer.java:884)
    ... 9 more
2013-06-11 09:51:50,742 DEBUG KNIME-Worker-0 WorkflowManager : File Reader 0:1 doBeforePostExecution
2013-06-11 09:51:50,742 DEBUG KNIME-Worker-0 NodeContainer : File Reader 0:1 has new state: POSTEXECUTE
2013-06-11 09:51:50,742 DEBUG KNIME-Worker-0 NodeContainer : tab2net 0 has new state: EXECUTING
2013-06-11 09:51:50,742 DEBUG KNIME-Worker-0 WorkflowManager : File Reader 0:1 doAfterExecute - failure
2013-06-11 09:51:50,783 DEBUG KNIME-Worker-0 File Reader : reset
2013-06-11 09:51:50,783 DEBUG KNIME-Worker-0 File Reader : clean output ports.
2013-06-11 09:51:50,783 DEBUG KNIME-Worker-0 WorkflowFileStoreHandlerRepository : Removing handler 0bb5cb95-9a5e-4d02-b33b-f2dc533e6313 (File Reader 0:1: <no directory>) - 0 remaining
2013-06-11 09:51:50,783 DEBUG KNIME-Worker-0 NodeContainer : File Reader 0:1 has new state: IDLE
2013-06-11 09:51:50,827 DEBUG KNIME-Worker-0 File Reader : Configure succeeded. (File Reader)
2013-06-11 09:51:50,827 DEBUG KNIME-Worker-0 NodeContainer : File Reader 0:1 has new state: CONFIGURED
2013-06-11 09:51:50,827 WARN  KNIME-Worker-0 Nominal Value Row Filter : No nominal columns with possible values found! Execute predecessor or check input table.
2013-06-11 09:51:50,827 DEBUG KNIME-Worker-0 NodeContainer : Nominal Value Row Filter 0:5 has new state: IDLE
2013-06-11 09:51:50,827 DEBUG KNIME-Worker-0 NodeContainer : tab2net 0 has new state: IDLE
2013-06-11 09:51:50,828 DEBUG KNIME-Worker-0 NodeContainer : tab2net 0 has new state: IDLE
2013-06-11 09:51:50,828 DEBUG KNIME-WFM-Parent-Notifier NodeContainer : Workflow Manager  has new state: IDLE
2013-06-11 09:52:02,934 DEBUG main NodeContainerEditPart : File Reader 0:1 (CONFIGURED)

"Caused by: java.io.IOException: No space left on device"

Your hard disk is full (the temporary directory).

This time made sure the temp folder was in my home dir. Still getting duplicate id error

Filesystem              1K-blocks      Used Available Use% Mounted on
devtmpfs                  1998020         0   1998020   0% /dev
tmpfs                     2016124       172   2015952   1% /dev/shm
tmpfs                     2016124      4344   2011780   1% /run
tmpfs                     2016124         0   2016124   0% /sys/fs/cgroup
/dev/mapper/korora-root  51606140   7215080  41769620  15% /
tmpfs                     2016124     16504   1999620   1% /tmp
/dev/sda1                  495844    113742    356502  25% /boot
/dev/mapper/korora-home 424624520 192298544 210756236  48% /home

AFTER CHANGING TMP TO ONE ON HOME DIR to /home/rajeev/knime/tmp AND RESTARTING KNIME(see screenshot)


[13:16] rajeev@u5-dt-linux-13 knime $ df -k
Filesystem              1K-blocks      Used Available Use% Mounted on
devtmpfs                  1998020         0   1998020   0% /dev
tmpfs                     2016124       172   2015952   1% /dev/shm
tmpfs                     2016124      4360   2011764   1% /run
tmpfs                     2016124         0   2016124   0% /sys/fs/cgroup
/dev/mapper/korora-root  51606140   7215104  41769596  15% /
tmpfs                     2016124   2016124         0 100% /tmp
/dev/sda1                  495844    113742    356502  25% /boot
/dev/mapper/korora-home 424624520 193808100 209246680  49% /home
 

I wonder if knime is not changing the tmpfs space allocation even though its specified in knime options (sceenshot)? I used normal partitions /root /home earlier but am using LVM in the current one. Could that be the reason why?

Just to double-check, you did restart KNIME after changing the temp-directory in the preferences? Also, do you have an option "-Djava.io.tmpdir" in your knime.ini?

Yes I did restart knime.

"-Djava.io.tmpdir" is absent in knime.ini . Should I manually set it to "-Djava.io.tmpdir=mytempdir"

Hm, I have a hunch what may be going wrong here. Please try to set java.io.tmpdir in the knime.ini and tell us, if this helps.

Hi Thor,

 

I can confirm that manually setting java.io.tmpdir in knime.ini solves all file read problems. Thx for your help.

This issue has been resolved in KNIME Desktop v2.8