Beta CSV Array Reader NullPointerException

It seems the CSV Array Reader node is not very deterministic. I have a CSV file (without label) with 12578 rows and 249 columns. It occasionally fails with too short rows (all have the same number of columns) or when checked the allow short lines with extra values (in various lines for the same file). There was a case, when it created a NullPointerException:

ERROR CSV Array Reader 2:126:118 Execute failed: (“NullPointerException”): null
DEBUG CSV Array Reader 2:126:118 Execute failed: (“NullPointerException”): null
java.lang.NullPointerException
at org.knime.base.node.util.BufferedFileReader.readNextChar(BufferedFileReader.java:263)
at org.knime.base.node.util.BufferedFileReader.read(BufferedFileReader.java:318)
at org.knime.core.util.tokenizer.Tokenizer.getNextChar(Tokenizer.java:487)
at org.knime.core.util.tokenizer.Tokenizer.nextToken(Tokenizer.java:349)
at org.knime.base.node.io.filereader.FileRowIterator.next(FileRowIterator.java:347)
at org.knime.base.node.io.csvreader.CSVArrayReaderNodeModel2.execute(CSVArrayReaderNodeModel2.java:104)
at org.knime.core.node.NodeModel.execute(NodeModel.java:733)
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:567)
at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1177)
at org.knime.core.node.Node.execute(Node.java:964)
at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:561)
at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)
at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:179)
at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:110)
at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:328)
at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:204)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)

You could try and use the R package readr. Several people have reported good results with that when importing ‘messy’ CSV files.

Thanks, this file was not messy, it was exported by KNIME’s CSV Writer (checked it with LibreOffice too, it was fine). In this case it seems the CSV Array Reader node is the buggy.

(The reason I wanted to use this because this table I would consider wide, so hoped the wide table extension would help with efficient handling.)

If you could upload an example where this fails. And also I am not sure what the Array Reader is. Is it the CSV Reader (https://nodepit.com/node/org.knime.base.node.io.csvreader.CSVReaderNodeFactory) or the File Reader (https://nodepit.com/node/org.knime.base.node.io.filereader.FileReaderNodeFactory).

What I meant with messy is that CSV/TXT files sometimes could have imbalanced quotes or a separator would also occur within a string variable and the KNIME CSV nodes seem to be ill equipped to handle that. Therefor I use the R package or use a different file format in order to avoid problems that could stem from CSVs.

CSV Array Reader. I have deleted the CSV file (that was only for temporary experimentation, which failed), but I assume it is not hard to reproduce some of its problems.
Thanks for the suggestions, but for I was curious about the wide data nodes, not others. (The file as mentioned in the original message was practically just a matrix with double precision numbers, nothing special.)

1 Like