I am observing a performance issue in the Database Writer Node (2.10). When appending data to a really large table in a loop, I find that the node always issues a "Select * from (Select * from TABLE) temptable4040404038383 limit 0" before starting the INSERT.
(TABLE being the table name).
As the table has some 20 million rows, this takes 15 minutes in each loop run. The appending of 500.000 rows, which is what the node does, is faster than that!
Is this statement really necessary? If it is, why does it have to be nested? If the LIMIT clause is on the outer query, it fetches the whole TABLE into the temptable and then limits this one. Limiting the inner clause might be a much, much more efficient way, isn't it?
Thanks in advance for any pointer to how I can improve this situation.
We cannot put the LIMIT into the inner query, because it can be an arbitrary user-defined SQL query that already contains a LIMIT statement. Any decent query optimizer should be able to optimize the query and push the LIMIT into the inner query. It seems MySQL is quite limited in this respect.
We need this query in order to determine the structure of the table we are inserting into for checking whether the KNIME input table is compatible with the database table. Looks like we have to look for a workaround for MySQL.
thanks for your quick reply. I do not quite understand what do you mean with "arbitrary user-defined SQL query". As far as I can see it, there is no way to specify a user-defined query in a Database Writer.
Is it because all database nodes share this implementation? Then there might be a quick fix for it: In the database reader node, I am preventing this behaviour by checking the "Run SQL query only on execute, skip configure" box.
Couldn't it be possible to offer such a checkbox for the database writer: "Check table structure before inserting". If the structure supplied by the table is correct, then the INSERT should work anyway.
Cheers and best regards,
In principle, you are right: for the Database Writer we could indeed use the LIMIT directly in the select statement. However, we have an underlying framework that generically retrieves metadata for all queries in the same way and this uses subqueries in order to work in all cases.
I just searched a bit in the MySQL documentation and they state that starting with version 5.6.3 the query optimizer got more intelligent about subqueries (http://dev.mysql.com/doc/refman/5.6/en/subquery-optimization.html#subquery-materialization).
thanks for the pointer to 5.6. You are right, with MySQL 5.6.19 the execution of this query is almost instant even for large tables.
Did this behaviour of the Database Writer Node change with 2.10? We have a lot of workflows writing small batches into really large tables and I can't remember having these problems prior to the upgrade from 2.9.4.
If so, another solution would be using the "old" database writer node.
Thanks for your support and the hint to upgrade MySQL.
I'm using Database Writer node (Knime 3.4.1) and it's very slow. It takes minutes to write down a few thousand records (and just 38 columns) to an (empty) MySQL table.
Is there a better alternative to transfer the "Knime's data table" to an MySQL table?