Large volumes with Knmie

fmc00006 · January 24, 2025, 5:52pm

Good evening

I am planning to use Knime, mainly, as an ETL tool to populate data into a company datawarehouse, either in local version or in server version mode.

Do you know the kind of limitation on volume or performance you have experimented working with large volumes of structured data? I mean the volumes of data that your workflows have managed with some Server o local computer hardware characteristics?

Thanks in advance

mlauber71 · January 24, 2025, 6:37pm

@fmc00006 I can offer this article with my experience.

Mostly the question is how large is the data and what is the power of the system where knime is running and the connection to the server where the database is.

Then with large files and knime

Do it in chunks / loops
Use streaming when possible
check underlying data formats like parquet (external) or columnar backend (internal)

PBJ · January 25, 2025, 1:30pm

We are working with large structured data (more than 6.000.000 records) without any problems. (reading large Excel files into CSV files and ETL).

fmc00006 · January 25, 2025, 2:41pm

Thank you very much for answering me. Your answers are very helpful.

Kind regards

Fernando

Add94 · January 25, 2025, 2:49pm

If you use duckdb jdbc with DB nodes:

https://mvnrepository.com/artifact/org.duckdb/duckdb_jdbc

And if your hardware has plenty of ram and a fast SSD, you can seamlessly process even a billion+ rows locally in seconds. I often work with 100+ M rows on a Thinkpad with 16 gb ram and Ryzen 5 7535u cpu only - all queries (incl parametric queries , window functions etc) done within seconds.

fmc00006 · January 25, 2025, 3:05pm

Wow!!! I’am very impressed. Thanks for your comments.

system · February 8, 2025, 11:59am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.