I have been working on a Workflow which uses a SQLITE Database to handle some data. I’ve isolated the problem in a separate workflow and have made a simple table structure (just 3 Integer columns with an UNIQUE constraint which specifies that the combination of the 3 columns must be unique) so anyone may run it and hopefuly better understand the problem I’m about to describe.
When using the DB INSERT/WRITER node to insert new rows in a table with an UNIQUE constraint the node fails to properly show a message such as “…UNIQUE constraint failed…” when at least one row already existed in the table and the Batch Size of the node has been configured to be greater than 1. For example, if the table already comprises the set (5, 10, 5) at columns (int1, int2, int3) and you try to INSERT said (5, 10, 5) values along with other rows with values which are not yet in the table (such as, say, (5, 10, 6)) then you will get at the InsertError output column the value “Something happened while batch execution. Some of the rows could be executed.” for every row in the that Batch, instead of getting specific information for each row such as InsertStatus=0 and InsertError=“…UNIQUE constraint failed…” for rows which were already in the table and InsertStatus=1 and InsertError=NULL for rows which were yet to be added to the table.
The behaviour I just describred is undesirable, yet understandable to some degree (there may be an explanation I don’t know at the JDBC/SQLITE level). What is really worrying is that at the same time, when you try to INSERT rows which are not yet in the table (new rows) along with rows which are already in the table (old rows), which is the same problem described above, the new rows that happen to be in the same Batch that some old rows will also fail (let’s remember that the old rows are expected indeed to fail because of the UNIQUE constraint). For example, if you try to INSERT a set of 6 rows which happen to be in the same Batch…
- (10, 15, 10)
- (10, 15, 11)
- (10, 15, 12)
- (10, 15, 13)
- (10, 15, 14)
- (10, 15, 15)
… and rows (10, 15, 10), (10, 15, 11) and (10, 15, 12) were already in the table, you will get the error “Something happened while batch execution. Some of the rows could be executed.” and also rows (10, 15, 13), (10, 15, 14) and (10, 15, 15) will NOT be inserted although the error message claims that “some of the rows COULD be executed”, and you are NOT being warned about this data loss.
The data loss could be avoided by specifying Batch Size=1 in the node’s configuration, as I’ve tested in my example workflow, but this may cause the workflow to be a lot slower, specially with large tables.
I’m using KNIME 4.6.3 and for this case I’m just using the base KNIME DB nodes. As I said, I’ve isolated the problem in a Workflow which you can find here (the Workflow uses DB INSERT but the same happens if DB WRITER node is used):
Bug_DBINSERT.knwf (26.9 KB)
The Workflow uses the SQLite Connector node to create a temporary (in-memory) database, but in my original Workflow I was working with a SQLITE database file (where the same happens).
Please let me know if you know this to be a JDBC problem rather than a KNIME problem (I don’t have the time and knowledge right now to test in pure Java with JDBC), or if you know why this is happening and how to approach a solution.