Database Update Node filling up tempdb (SQLServer)

Hello,
I am currently trying to graphically evaluate a model. To do this I am recalculating my predictions on a time series that is stored in an SQLExpress Database. This table is read by an external visualization system.

My Workflow is reading the table from the database, then doing all the stuff that is needed so I can do predictions on the data. The resulting table is then formatted in a way that I should be able to overwrite my old predictions inside the database.

I am trying to do this by using the Database Update node (also tried the legacy version) which is configured as follows:

My problem is that this update ( ~15000 rows ) is taking very very long.
After about 30-40 minutes the update will crash at ~50% done because my disk is running out of space.
This is because while writing the updates to the database my tempdb ( TempDB is a system database in Microsoft SQL Server used as a store of internal objects, row versions, work tables, temporary tables, and indexes) will fill up more and more. Last time I let the update run it crashed after the tempdb had grown to about 160Gb filling up the remaining free space on my hard disk.

Things I have tried so far:

  • Shrinking both the database I want to write to and tempdb then retrying
  • Trying both the legacy and current version of the Microsoft SQL Server Connector Node and the Database Update Node
  • Increasing the Batchsize from 1 to 1000 (Database Update)
  • Restarting Knime
  • Resetting the complete Workflow
  • Restarting my system

Hi,
have you tried to perform the update manually e.g. in another SQL Editor to see what happens? To see the exact statement that KNIME sends to the db you can enable the JDBC logger in the Advanced tab of the DB Connector node.

How selective is your Id? How many rows are in the db table that gets updated? The execution of all update statements (one for each KNIME row) is performed in a single transaction. So if your db table is very large and you update all rows this might result in your db running out of temp space.
If the updates could be done independently (it is ok if some fail but some are processed e.g. committed) you could split-up the input table e.g. by using the Chunk Loop Start node to update fewer rows in the DB table.

Also if an update takes very long you might want to consider creating an index on the where columns of your update statement to prevent a full table scan for each update.

Bye
Tobias

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.