Fetch huge mysql table

Hi, I was wondering if some can suggest me a smart way for fetch a huge mysql table from Database Table connection reader node.

Maybe there is a good workaround for fetch a specific chunk of row per time?

Thanks in advance.

Please search the forum for “fetch size”

Thanks @quaeler for the reply,

you mean something like: Dknime.database.fetchsize=XXXX?

Ya - if you add a larger specification to your knime.ini file and then start KNIME, it will attempt to pull that many rows in each fetch transaction.

It’d be helpful to know what you’re actually doing with the data. Do you really need all of it in a table at once? Otherwise: Could you just page/stream over portions?

– Philipp

Hi @qqilihq, yeah, also page/stream would be good. More or less the point is that I don’t want to fail by querying all the data together in 1 time. Can you show me an example of workflow for query per page or chunk as workaround?

Cause the problem occurs when I try to process data (let’s say 5 mln of data) from Database table selector to Database connection table reader nodes.

Sorry, I don’t have any WF which I could publicly share, but I did something similar a while ago and the concept was to build the DB query dynamically within a Flow Variable and pass it to the corresponding Database Query node. Depending on your data, the query could be e.g. as simple as

SELECT * FROM table 

Use e.g. a loop and update $$OFFSET$$ within each iteration, break the loop when there’s no more data. Process all the data within the loop and do not return it (elsewise there’ll obviously be no benefit, as you’ll end up with a huge table nonetheless).

However, you say …

don’t want to fail by querying all the data together in 1 time

I’m not sure I understand correctly, but if you have very strict atomicity constraints, this approach is probably not the right one, as you’ll obviously be doing multiple queries in sequence.

Still, I’m not sure whether this is going in the right direction and for giving better advice, it would be necessary to understand what kind of operations you’re actually doing with that data.