Slow DB reader

Hi there!
I have to read from one database and from one datawarehouse but the performance is too slow in both of them.

My knime.ini:

The database configuration is implemented with IBM DB2. Without making any other action besides reading, i.e.: only loading one table with next to 400000 rows, it takes +6 minutes to read.

Regarding the datawarehouse, It is implemented with MariaDB, I connect to it with a MySQL connector. When reading, the table has +800000 rows, and it takes around 10 minutes.
I am guessing these times are not ok.

Any help would be nice.
Best regards!

Did you try to specify fetchsize directly in the connector node? Also you can try to stream DB Reader.

Yes, in both cases it has the 10000 by default.
About streaming, I saw this as an option but I think the +5 minutes mark for files under 1M rows and under 50 “normal” columns is not acceptable, right?
I am starting to work with DBs, so any guidance is :ok_hand:

Hi @jorgemartcaam , these numbers would indeed considered as slow.

On top of memory, there are other factors to consider though, such as network and I/O.

1 Like

I am working in a local network.
Regarding I/O, I am guessing nothing special. Not bad, not good. Average.
Can it be a problem with the driver or something?

What is the query looking like? How many tables within the db are you calling? How are they joined together? This could also impact the performance negatively.

Hi @jorgemartcaam , as per @ArjenEX , we would need to see what kind of queries you are running.

How is the performance if you ran the query on the command line? Can you run an EXPLAIN on your query?

I would really check the network and with that I mean latency. You can just ping the DB server and see the latency. It is a huge issue and can create huge performance problems.

Even if you are on-site and the database is “on-site”, the connection to the database might happen over the internet to a data center. So it is possibly a VPN and not really a local connection. “On-site” doesn’t really mean the server is right in the same building.

As reference I have a local copy of a database I often use. Reading all 300k rows takes like 20 seconds when you are really in the local network (ping something lime 0.01 ms). If I read directly from the database even when on-site it takes more like 20 minutes with a ping of 40 ms which isn’t bad at all but shows the issue.

Having said that of course it also matters how much data you are loading as in number of bytes and from what columns. CLOB/BLOB can be very, very slow and it implies possibly large amounts of data. If each row has such a column with say 5mb of data, then yeah it would explain a lot as well just like latency.

1 Like

The query looks like follows. I want to read only one table. Then, if it works properly, I will try to join around 5 or 6 more to it.
SELECT * FROM #table# AS tablewhereTIMESTAMP>='2022-01-01 00:00:00'
From my home, via vpn, this query returns 822062 rows and it takes 165 seconds (2 minutes and 35 seconds), according to the timer node.

One per query, 5 queries total.

Via DB Joiner, inner join matching from 3 to 5 columns.

I did not try yet, I am starting to use it from Knime, which I think can answer if I can

which I can not. I do not know how.

Will try!

This is good to know. Since every data related event was faster, I assumed the server was on site too.

About the columns… I do not have CLOB/BLOB but I do have BIGINT. Would it be possible to transform some column to just INTEGER?

Thank you guys!

Hi @jorgemartcaam , essentially the questions I asked is to determine if the slowness that you are experiencing is because of Knime or simply because of your set up.

If it’s because of your set up, then there is not much that Knime can do or rather, there is nothing that you can do in Knime to fix this (drivers, configuration, etc, none of these will make it faster).

So, it’s hard to determine where to focus the effort if we don’t have the answers to these questions…

I am also experiencing extremely slow/poor performance querying DB2 data from zOS using KNIME. Glad to hear that it is not just me. I am using the V. JDBC driver.

Matt S.