String to Document crashes

I’m seeing the same problem. In the log I see this:

2019-05-07 19:25:34,032 : DEBUG : KNIME-Worker-7 : DocumentBufferedFileStoreDataCellFactory : Strings To Document : 0:28 : Could not store document in cell: ESB-2017.1878 - [RedHat] X.org X11: Multiple vulnerabilities
java.io.UTFDataFormatException: encoded string too long: 71703 bytes
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:364)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)
at org.knime.ext.textprocessing.util.TermDocumentDeSerializationUtil.fastSerializeDocument(TermDocumentDeSerializationUtil.java:606)
at org.knime.ext.textprocessing.data.filestore.AbstractDocumentFileStoreCell.serializeDocument(AbstractDocumentFileStoreCell.java:293)
at org.knime.ext.textprocessing.data.filestore.BufferedFileStoreWriter.write(BufferedFileStoreWriter.java:96)
at org.knime.ext.textprocessing.data.filestore.DocumentBufferedFileStoreCell.(DocumentBufferedFileStoreCell.java:129)
at org.knime.ext.textprocessing.data.filestore.DocumentBufferedFileStoreDataCellFactory.createDataCell(DocumentBufferedFileStoreDataCellFactory.java:149)
at org.knime.ext.textprocessing.util.LRUDataCellCache.getInstance(LRUDataCellCache.java:120)
at org.knime.ext.textprocessing.nodes.transformation.stringstodocument.StringsToDocumentCellFactory2.getCells(StringsToDocumentCellFactory2.java:236)
at org.knime.core.data.container.RearrangeColumnsTable.calcNewCellsForRow(RearrangeColumnsTable.java:494)
at org.knime.core.data.container.RearrangeColumnsTable$ConcurrentNewColCalculator.compute(RearrangeColumnsTable.java:722)
at org.knime.core.data.container.RearrangeColumnsTable$ConcurrentNewColCalculator.compute(RearrangeColumnsTable.java:1)
at org.knime.core.util.MultiThreadWorker$ComputationTask$1.call(MultiThreadWorker.java:442)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:328)
at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:204)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
2019-05-07 19:25:34,034 : DEBUG : KNIME-Worker-3 : LRUDataCellCache : Strings To Document : 0:28 : Closing lru data cell cache.
2019-05-07 19:25:34,034 : DEBUG : KNIME-Worker-3 : Node : Strings To Document : 0:28 : reset
2019-05-07 19:25:34,034 : ERROR : KNIME-Worker-3 : Node : Strings To Document : 0:28 : Execute failed: Cell at index 0 is null!
2019-05-07 19:25:34,035 : DEBUG : KNIME-Worker-3 : Node : Strings To Document : 0:28 : Execute failed: Cell at index 0 is null!

And the size of the document it is crashing on looks like around the 71703 bytes mentioned in the log.

I also found that it ‘went away’ after I fiddled with the Config dialog. I unchecked some of the default options and changed the parser selection so it could have been any of them.

Any ideas?

Edit: The problem seems to be the ‘Use authors from column’ setting. This is enabled by default and is, in my case, set to the ‘Full text’ column. Toggling the check box toggles the problem.

1 Like