Best Practice for to read Larger Datasets in KNIME.

kotlapullareddy · May 24, 2023, 8:50am

Hello,

Here I would like to know the best practices for read the large amount of data tables into KNIME. while I’m reading large DB.Table (20GB/7M rows) data KNIME nodes ((Data Query Reader/ DB Table Reader ) taking hours or end with execute failed.

Not sure how to execute the large datasets in KNIME.

Thanks in Advance.

sascha.wolke · May 24, 2023, 9:07am

Hi @kotlapullareddy,

This sounds like a huge table. You can use the Database nodes to aggregate the data in the Database, and only transfer a summary to KNIME.

Cheers,
Sascha

tobias.koetter · May 24, 2023, 9:16am

Hello @kotlapullareddy ,

I’m sorry to hear that you have problems reading large amounts of data from a database. In general this shouldn’t be a problem also if the data is much larger than your memory since KNIME cache data on disc if necessary. However as Sascha mentioned we recommend you to push down the data preparation into the database as far as possible to make use of the databases processing power.

To better help you, can you please be a bit more specific e.g. which database are you connecting to, are you executing complex queries e.g. with lots of joins or only simple queries e.g. SELECT * FROM table, and what error messages do you get in KNIME

Bye
Tobias

kotlapullareddy · May 24, 2023, 9:31am

Hello @tobias.koetter

Its very simple workflow, we are reading a table from postgresDB, then writing into another DB in snowflake. I don’t have any issue with small table (<300MB size). But, we have a Table size 20GB which are struggle to process.

mlauber71 · May 24, 2023, 10:10am

@kotlapullareddy you could think about doing this in chunks or with the help of streaming. Do you have any IDs that you could use to do this in steps.

For large file in general and resuming loops I can offer up this article:

kotlapullareddy · May 24, 2023, 10:33am

Thanks for your quick support @mlauber71.

If I understand correctly, we need to read the data from table then pass into chunks. But, we have issue with DB Reader/DB query reader nodes it’s not pass data to next node. it’s either taking hours or execution failed.

really I need to understand, how we send the data chunks wise before reader node ?.

mlauber71 · May 24, 2023, 12:50pm

@kotlapullareddy with the streaming you would do the downloading within the streaming component and so only get a few rows per chunk. Maybe you take a look at the examples provided.

The DB Connection is inside the component:

Do you have any IDs or RowIDs you could select first and use to reference a chunk?

system · August 22, 2023, 12:50pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.