Reading large data sets

Hi,

 

I am having trouble when reading large data sets of several million rows. The node starts to import the data, then after a while (usually several million rows in) the database reader seems to hang then change from executing to complete with the green light. However when I do a row count I can see that the database reader node does not contain all the rows.

 

I have checked the log file and can not see anything in there that states that knime has timed out or hung.

 

Has this happened to anyone else or does anyone have any ideas?

Im using knime 3.4.1 64 bit with ojdbc14.jar

 

Cheers

You can add to your SQL statement some variables and use loop node.
Loop can take each set of variables and execute SQL statement for them.
example:
if you have transaction data add transaction date for each loop iteration:
chunk 1 = 01.09.2017 to 30.09.2017
chunk 2 = 01.10.2017 to 31.10.2017 and so on.
In the end you have all data.

 

Robert

Hi,


I guess that is a work around but not ideal. I commonly work with large data sets so really need to know that when I read data in Im getting the whole data set.

Ive had the problem again today. Though this time the node still looks like its executing but not more rows are being read in and I cant stop the node. I have to force quit knime with ctrl+alt+delete.

I have attached the knimelog.

 

Could anyone please advise any fixes for this?

 

Cheers

Once again the database reader node has hung then says it has completed when it has not.

Below is the log file

 

2017-10-18 09:00:50,598 : DEBUG : main : NodeContainerEditPart :  :  : Database Reader 0:80 (CONFIGURED)
2017-10-18 09:00:53,261 : DEBUG : KNIME-Worker-10 : Buffer : Database Reader : 0:78 : Buffer file (C:\Users\operagn\AppData\Local\Temp\knime_OAI_1722_LT_SUR13767\knime_container_20171018_4764708699993395180.bin.gz) is 55.52MB in size
2017-10-18 09:00:53,486 : DEBUG : KNIME-Worker-10 : DatabaseConnectionSettings : Database Reader : 0:78 : class oracle.jdbc.driver.T4CConnection#setAutoCommit(true) error, reason:
java.sql.SQLException: Closed Connection
    at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112)
    at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146)
    at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208)
    at oracle.jdbc.driver.PhysicalConnection.setAutoCommit(PhysicalConnection.java:1057)
    at org.knime.core.node.port.database.DatabaseConnectionSettings.setAutoCommit(DatabaseConnectionSettings.java:633)
    at org.knime.core.node.port.database.reader.RowIteratorConnection.close(RowIteratorConnection.java:120)
    at org.knime.core.node.port.database.reader.DBReaderImpl.createTable(DBReaderImpl.java:262)
    at org.knime.core.node.port.database.reader.DBReader.createTable(DBReader.java:121)
    at org.knime.base.node.io.database.DBReaderNodeModel.getResultTable(DBReaderNodeModel.java:150)
    at org.knime.base.node.io.database.DBReaderNodeModel.execute(DBReaderNodeModel.java:127)
    at org.knime.core.node.NodeModel.executeModel(NodeModel.java:566)
    at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1128)
    at org.knime.core.node.Node.execute(Node.java:915)
    at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:561)
    at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)
    at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:179)
    at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:110)
    at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:328)
    at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:204)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
    at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
2017-10-18 09:00:53,487 : INFO  : KNIME-Worker-10 : LocalNodeExecutionJob : Database Reader : 0:78 : Database Reader 0:78 End execute (47 mins, 36 secs)
2017-10-18 09:00:53,487 : DEBUG : KNIME-Worker-10 : WorkflowManager : Database Reader : 0:78 : Database Reader 0:78 doBeforePostExecution
2017-10-18 09:00:53,487 : DEBUG : KNIME-Worker-10 : NodeContainer : Database Reader : 0:78 : Database Reader 0:78 has new state: POSTEXECUTE
2017-10-18 09:00:53,487 : DEBUG : KNIME-Worker-10 : WorkflowManager : Database Reader : 0:78 : Database Reader 0:78 doAfterExecute - success
2017-10-18 09:00:53,487 : DEBUG : KNIME-Worker-10 : NodeContainer : Database Reader : 0:78 : Database Reader 0:78 has new state: EXECUTED
2017-10-18 09:00:53,489 : DEBUG : KNIME-Worker-10 : Constant Value Column : Constant Value Column : 0:79 : Configure succeeded. (Constant Value Column)
2017-10-18 09:00:53,489 : DEBUG : KNIME-Worker-10 : NodeContainer : Database Reader : 0:78 : Constant Value Column 0:79 has new state: CONFIGURED
2017-10-18 09:00:53,489 : DEBUG : KNIME-Worker-10 : NodeContainer : Database Reader : 0:78 : OAI-1722_LT_SURCHARGE_DATA_MATCHING 0 has new state: IDLE
2017-10-18 09:00:53,490 : DEBUG : KNIME-WFM-Parent-Notifier : NodeContainer :  :  : ROOT  has new state: IDLE
2017-10-18 09:00:53,490 : DEBUG : AWT-EventQueue-0 : Buffer : Database Reader : 0:78 : Opening input stream on file "C:\Users\operagn\AppData\Local\Temp\knime_OAI_1722_LT_SUR13767\knime_container_20171018_4764708699993395180.bin.gz", 1 open streams
2017-10-18 09:00:53,491 : DEBUG : KNIME-Node-Usage-Writer : NodeTimer$GlobalNodeStats :  :  : Successfully wrote node usage stats to file: D:\Data\knime_workspace\.metadata\knime\nodeusage_3.0.json