KNIME performance, reading from a database

I just started to use KNIME and it suppose managed a huge mount of data, but isn't, it's slow and often not response. I'll manage more data than that I'm using now, What am I doing wrong?.


I set in my configuration file "knime.ini":

    -XX:MaxPermSize=1024m
    -Xmx2048m

I also read data from a database node (millions of rows) but I can't limit it by SQL (I don't really mind, I need this data).

    SELECT * FROM foo LIMIT 1000

error:

    WARN      Database Reader     com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'LIMIT 0' at line 1

 

The latter problem (LIMIT) is solved in the next KNIME release.

What do you mean by slow? How much data are you trying to read? If you are reading from a database it's often the database (or the connection) that is slow and not KNIME itself.

you are right.

do you known when will come the next release, It's anoying do tests with 25millons of rows

6th of December.

Hi Xedo,

I guess the problem is that the MySQL JDBC driver fetches all results into memory when executing query before returning the result. In order to change this behaviour you can specify the default fetch size in the knime.ini using the -Dknime.database.fetchsize parameter e.g. -Dknime.database.fetchsize=1000. For more information on how to set the fetch size have a look at the FAQ.

Best regards,

Tobias

I'm also experiencing terrible performance with the database nodes. I'm reading from a table with 76m rows. The SQL querries on the machines run within miliseconds. When transfered into KNIME's database nodes the same querries take minutes--if they execute.

I've tested them on: 

Postgres

MySQL

Windows and Mac OS X

with various jdbc drivers, which leads me to the conclusion there is something about how KNIME sends the SQL to the connector that is not optimized. 

 

 

We are using the standard JDBC mechanism to access the databases. There isn't much you can do right or wong here (except the fetch size that Tobias alreay mentioned). Note also that there is huge difference between executing a query and fetching the results. Usually the network is the bottleneck (even if it's a fast network connection) because of the latencies between requests and responses.