DuckDB builtin?

as we didnt get a response here yet: any plans on integrating DuckDB?

1 Like

As it turns out, we do have a relatively new ticket for this in the system (AP-21915). I don’t have any information about timeline, but I’ll add a link to this thread as a +1 from you.

4 Likes

Great news @ScottF !

In the interim, I have successfully used DuckDB. These are (approximately) the steps to get up and running locally:

  1. Ensure duckdb is installed on your system (DuckDB Installation – DuckDB)
  2. Download the JAR file for the JDBD connection (See the note for where to place this on your system) Tableau – A Data Visualization Tool – DuckDB
  3. In KNIME, under Settings > Database Create a database driver. I created a couple screenshots below.


image

From there, in your workflow:

  • Use a DB Connector Node

image

And edit the connection to use the database setup created above and point to the duckdb file on your machine.

In short, it’s possible, but native support would be fantastic.

4 Likes

I mean consider this: Knime uses Apache Arrow in the backend (afaik) but doesnt ship the Apache Arrow Flight JDBC Driver either.

at least it seems that you can develop Arrow Flight RPC nodes quite seemlessly

2 Likes

More and more, we have data in S3 buckets/data lakes in parquet format (or in Iceberg tables/catalogs).
A nice thing would be to be able to have a duckdb connector that could leverage the “attach” functionnality of duckdb so that one could connect simultanously to

  • a local duckdb or remote motherduck db
  • one of the supported external dbs (postgresql sqlite, mysql,…)
  • some files (parquet, csv, excel) and querying the whole set using knime DB nodes
  • some openstreetmap source files…
  • rest api

It’s possible today but users must first create a duckdb file where they attach all needed connections…
Anybody thinking this may be a useful usecase in knime?

2 Likes

moved this topic to Feedback and Ideas category - suggest all of you give it a vote using the button at the top :slight_smile:

3 Likes

In my view it is already pretty seamless to use duckdb within knime & implement all the above - you would create an in memory connection and put a bunch of sql statements delimited by ‘;’ in an sql executor node (create table/view as …; attach …; etc etc). You can dynamically link all the file paths into the query using flow variables too. Another trick I recently found out would be to add “BASE TABLE” table type into the metadata browser config of the DB connector to make the tables show up - took me a while to figure that out … credit to Vernalis nodes, specifically the “db list table types” :slight_smile:

Another trick I recently found out would be to add “BASE TABLE” table type into the metadata browser config of the DB connector to make the tables show up - took me a while to figure that out … credits to Vernalis nodes, specifically the “db list table types”

Hi, @Add94 :beers: What does this mean? Could you explain it in more detail? Thank you

1 Like

Hey @HaveF ! So basically there are databases that create tables other than the standard “TABLE” or “VIEW” type. This is referenced e.g. in this thread : Reading HANA Tables

DuckDB for some reason creates each table as a “BASE TABLE” type (VIEW works out of the box though) . So this has to be added to the following setting of the DB connector node " List of table types to show in metadata browser. When you’re using a new database and you’re not sure why you’re not seeing the tables, the following Vernalis node is useful , so you know how to modify the setting https://nodepit.com/node/com.vernalis.knime.db.nodes.rsettobdt.DBListTableTypesNodeFactory

2 Likes