ERROR with buffer node and "big" dataset

Hello,

I am working in KNIME with a dataset that contains around 37,000 rows. The dataset consists of addresses, which are cleanly divided into latitude and longitude coordinates. When processing this data with the Buffer Node, a specific error occurs.

When I attach the Buffer Node directly to the dataset of 37,000 rows, I receive the following error:

„Execute failed: Tried writing a batch after a batch with a different
size than the first batch. Only the last batch of a table can have a
different size than the first batch.“

To work around this issue, I split the dataset into two smaller tables, each containing about 15,000 rows. When I then attach the Buffer Node to these smaller datasets, the node runs successfully without triggering the error. However, when I merge the two smaller tables back together and integrate them into the Spatial Join Node, a new error appears:

(see below, its long)

Interestingly, when I reduce my original dataset using the Row Sampling Node and run it directly through the Buffer Node, everything works fine, including the subsequent operations in the Spatial Join Node.

Question:
What could be happening here? Why does the error occur only when processing the full dataset with the Buffer Node, but smaller datasets are processed without issue?

Thank you!

SCR-20241007-ncpn

"Execute failed: An error occurred while calling o18.getDataSource. : java.lang.IllegalStateException: Cannot read DataCell with empty type information at org.knime.core.data.v2.value.cell.DictEncodedDataCellDataInputDelegator.readDataCellImpl(DictEncodedDataCellDataInputDelegator.java:102) at org.knime.core.data.v2.value.cell.AbstractDataInputDelegator.readDataCell(AbstractDataInputDelegator.java:83) at org.knime.core.data.v2.value.cell.DictEncodedDataCellValueFactory$DictEncodedDataCellInvocationHandler.lambda$0(DictEncodedDataCellValueFactory.java:202) at org.knime.core.columnar.arrow.data.ArrowBufIO.deserialize(ArrowBufIO.java:101) at org.knime.core.columnar.arrow.data.ArrowVarBinaryData$ArrowVarBinaryReadData.getObject(ArrowVarBinaryData.java:147) at org.knime.core.columnar.arrow.data.ArrowDictEncodedVarBinaryData$ArrowDictEncodedVarBinaryReadData.lambda$0(ArrowDictEncodedVarBinaryData.java:162) at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(Unknown Source) at org.knime.core.columnar.arrow.data.ArrowDictEncodedVarBinaryData$ArrowDictEncodedVarBinaryReadData.getObject(ArrowDictEncodedVarBinaryData.java:160) at org.knime.core.columnar.data.dictencoding.DictDecodedVarBinaryData$DictDecodedVarBinaryReadData.getObject(DictDecodedVarBinaryData.java:150) at org.knime.core.columnar.cache.object.CachedVarBinaryData$CachedVarBinaryLoadingReadData.getObject(CachedVarBinaryData.java:253) at org.knime.core.columnar.access.ColumnarVarBinaryAccessFactory$ColumnarVarBinaryReadAccess.getObject(ColumnarVarBinaryAccessFactory.java:97) at org.knime.core.data.v2.value.cell.DictEncodedDataCellValueFactory$DictEncodedDataCellInvocationHandler.invoke(DictEncodedDataCellValueFactory.java:206) at jdk.proxy18/jdk.proxy18.$Proxy90.getDataCell(Unknown Source) at org.knime.core.data.columnar.table.ColumnarRowIterator.next(ColumnarRowIterator.java:171) at org.knime.core.data.append.AppendedRowsIterator.initNextRow(AppendedRowsIterator.java:208) at org.knime.core.data.append.AppendedRowsIterator.next(AppendedRowsIterator.java:180) at org.knime.core.data.container.FallbackRowCursor.forward(FallbackRowCursor.java:120) at org.knime.python3.arrow.PythonArrowDataSourceFactory.copyTable(PythonArrowDataSourceFactory.java:196) at org.knime.python3.arrow.PythonArrowDataSourceFactory.copyTableToArrowStore(PythonArrowDataSourceFactory.java:180) at org.knime.python3.arrow.PythonArrowDataSourceFactory.extractStoreCopyTableIfNecessary(PythonArrowDataSourceFactory.java:171) at org.knime.python3.arrow.PythonArrowDataSourceFactory.createSource(PythonArrowDataSourceFactory.java:121) at org.knime.python3.arrow.PythonArrowTableConverter.createSource(PythonArrowTableConverter.java:107) at org.knime.python3.nodes.ports.PythonPortObjects$PythonTablePortObject.getDataSource(PythonPortObjects.java:266) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.sendCommand(ClientServerConnection.java:244) at py4j.CallbackClient.sendCommand(CallbackClient.java:384) at py4j.CallbackClient.sendCommand(CallbackClient.java:356) at py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:106) at jdk.proxy6/jdk.proxy6.$Proxy32.execute(Unknown Source) at org.knime.python3.nodes.CloseablePythonNodeProxy.execute(CloseablePythonNodeProxy.java:560) at org.knime.python3.nodes.DelegatingNodeModel.lambda$4(DelegatingNodeModel.java:180) at org.knime.python3.nodes.DelegatingNodeModel.runWithProxy(DelegatingNodeModel.java:237) at org.knime.python3.nodes.DelegatingNodeModel.execute(DelegatingNodeModel.java:178) at org.knime.core.node.NodeModel.executeModel(NodeModel.java:588) at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1286) at org.knime.core.node.Node.execute(Node.java:1049) at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:594) at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:98) at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:198) at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:117) at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:367) at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:221) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123) at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
SCR-20241007-ncpn
"

Hello @Walter_works,

I would suspect it has to do with memory which is why it works when you split it up versus passing all the rows to it. It could be possibly with KAP, so you could try increasing memory allocated for Knime, but it can also be the way the node handles memory.

As a suggestion to make your workflow more concise, you could try using the ‘Chunk Loop Start’ node to essentially do what you have shown in the image with it passing consecutive chunks of rows so it won’t cause the buffer to error out. (you can specify the amount of rows for each chunk in the node)

Hope this helps,
TL

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.