Does the Columnar Backend work with DB nodes?

lsandinoIQ · November 27, 2024, 2:43pm

Does using Columnar Backend for database reads improve performance when working in-database? Or can it help make in-database searches more efficient?

mlauber71 · November 29, 2024, 7:08am

@lsandinoIQ I don’t think it can because the database processes will happen in the database. The columnar storage might only help if you are pulling data into knime and process it there. Also all kind of compression will need some additional resources for doing the compression - the benefit might then come in storing and handling.

For efficient database usage there might be a ton of things very much depending on the type of DB, the data, the use case the code, DB settings and planning the tasks, power of the host server and so on …

Add94 · November 29, 2024, 11:23am

I second @mlauber71 comments + would really want to emphasize the “type of DB used” part. Assuming the same hardware , switching to an olap db really gives the best “■■■■ for the buck” & the process will run two orders of magnitude faster. While OLAP engines are becoming more popular , many still rely on traditional OLTP DBs (MySQL, postrgresql, SQLite etc) for analytical queries which are not optimized to run them at scale. I’d highly recommend moving the data to Clickhouse / DuckDB or using Polars dataframes within KNIME via python nodes.

A few benchmarks you may find interesting (e.g. see MySQL/SQlite vs duckDb on the same hardware):

https://benchmark.clickhouse.com/

https://duckdblabs.github.io/db-benchmark/

system · December 12, 2024, 3:27pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.