Large volumes with Knmie

Good evening

I am planning to use Knime, mainly, as an ETL tool to populate data into a company datawarehouse, either in local version or in server version mode.

Do you know the kind of limitation on volume or performance you have experimented working with large volumes of structured data? I mean the volumes of data that your workflows have managed with some Server o local computer hardware characteristics?

Thanks in advance

@fmc00006 I can offer this article with my experience.

Mostly the question is how large is the data and what is the power of the system where knime is running and the connection to the server where the database is.

Then with large files and knime

  • Do it in chunks / loops
  • Use streaming when possible
  • check underlying data formats like parquet (external) or columnar backend (internal)
1 Like

We are working with large structured data (more than 6.000.000 records) without any problems. (reading large Excel files into CSV files and ETL).

1 Like

Thank you very much for answering me. Your answers are very helpful.

Kind regards

Fernando

If you use duckdb jdbc with DB nodes:

https://mvnrepository.com/artifact/org.duckdb/duckdb_jdbc

And if your hardware has plenty of ram and a fast SSD, you can seamlessly process even a billion+ rows locally in seconds. I often work with 100+ M rows on a Thinkpad with 16 gb ram and Ryzen 5 7535u cpu only - all queries (incl parametric queries , window functions etc) done within seconds.

1 Like

Wow!!! I’am very impressed. Thanks for your comments.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.