writing db data to Snowflake

ChrMavrom · January 19, 2024, 10:57am

Hi all,

I am new to KNIME and I need your help.
Currently I am using the process in the workflow shared to write a KNIME table to Snowflake using the nodes you see.
As the workflow takes too long to run and I need to improve it, I transformed all the nodes to db nodes. In the steps shared, I need a way to write my db-data to Snowflake without having to use the DB reader node.
Any suggestions?
Note: steps of the extractions are

Create the table if it does not exist
Check if the data for this specific entity and month (2 flow variables) exist and if they do delete the rows
Write the data in the table.
Extract DB data to Snowflake.knwf (38.4 KB)

takbb · January 20, 2024, 10:28am

Hi @ChrMavrom , when I read your post, I was thinking that you are trying to write from a data source outside of Snowflake, into Snowflake, in which case you would have to use a DB Reader to bring the data into KNIME from where it could then be passed to Snowflake.

However, in looking at your workflow, I believe that all of your data resides in a single Snowflake database, and you are using KNIME to perform the required transformations. Is that correct?

If that is the case, then you can remove the DB Reader and perform the bulk of the data operations within Snowflake, without the data having to be returned to KNIME.

I cannot view the config of your workflow, because I cannot execute it (it relies on your snowflake connection). If you’d been able to upload a working example using an alternative database such as the built in H2 database, then it would have been simpler to assist but instead I’ve made up a workflow to demonstrate the principle.

In this workflow, a sample database table “ENTUPDATES” is accessed.

From this table, a query returns the distinct set of Entity and Month and then this is iterated on with the loop.

The DB Query node is used to supply the query for the database
SELECT "Entity", "Month","Dummy"FROM #table#

But this is not directly used in a DB Reader. Instead, it is used to provide the required table structure to the DB to Empty Table node, and the query itself is extracted by the DB Query Extractor node.

In my workflow, I am creating a table (if it doesn’t exist) with the name of the entity. So at the end I’ll have additional tables A,B,C.

The table structure from the DB to Empty Table node is used as the basis for the DB Table Creator in the same way that the DB Reader in you workflow operated.

There is a word of caution here. By default the DB Table Creator defines string columns as varchar(255), so somebody using this technique could hit problems unless they take steps to change the Type defaults. However, for Snowflake this is should be irrelevant, since Snowflake creates every varchar column as 16MB in size regardless of what you tell it

Meanwhile the list of columns is extracted from the DB to Empty Table node, by the the Extract Table Spec node. This is then turned into a delimited list, so in this case the columns

become the list:

On a given iteration, the DB Query Extractor extracts the sql for the query:

SELECT "Entity", "Month","Dummy"FROM (SELECT * FROM "PUBLIC"."ENTUPDATES"
where "Entity" = 'C'
and "Month" = 2
)

This is then combined along with the delimited string of columns into an Insert statement in the DB SQL Executor:

delete from $${SEntity}$$ 
where "Entity" = '$${SEntity}$$' and "Month" = $${IMonth}$$;

insert into $${SEntity}$$ ($${SColumn Name}$$)
$${Ssql}$$

Here I’ve demonstrated including the delete statement inside the Executor, but you could put it outside in the DB Delete (Filter) node as you had in your workflow.

The important thing here is the insert statement. This will get transformed at run time into the following:

insert into "A" ("Entity", "Month", "Dummy")
SELECT "Entity", "Month","Dummy"FROM (SELECT * FROM "PUBLIC"."ENTUPDATES"
where "Entity" = 'C'
and "Month" = 2
)

so the whole insert will take place in-database, rather than returning the data to KNIME and then writing back out.

DB demo 20240120.knwf (51.6 KB)

ChrMavrom · January 24, 2024, 9:29am

to start with, yes i am reading from Snowflake do some data steps in KNIME and then write the data back to snowflake.
As I am new to these things, i am not able to provide you a sample with a different connection.
my 1st question is what do you mean when you say “perform the bulk of the data operations within Snowflake”?

takbb · January 24, 2024, 9:51am

Hi @ChrMavrom,

No probs re the connection. I wasn’t expecting you to be able to supply a snowflake connection. Just explaining that using H2 for demonstration purposes can be a possibility which is why I’ve used it in the workflow.

What I meant by performing the “operations within Snowflake” was…

If you have a task which is to insert into one table on Snowflake data which is being queried from another table in the same Snowflake database, there are a couple of ways you can perform that.

You can read the data from Snowflake into KNIME and then Insert into Snowflake from KNIME

or

You can use SQL directly in a DB SQL Executor, using the syntax:

INSERT INTO  tableB (colX,colY,colZ) 
SELECT colA, colB, colC FROM TableA 
etc

(1) In the first scenario, data is fetched into KNIME in order that it can be transformed and is then uploaded back again to Snowflake. In this scenario, there is no option other than to have the data come down the line. KNIME can’t “see it” remotely. It has to request and fetch the data.

(2) In the second scenario, Snowflake is simply given the SQL to run (the insert statement). The data never passes down the line to KNIME and back up. The processing there is done purely within Snowflake.

Whether scenario 2 is an option for you is down to the specifics of the use case, but it is what I am demonstrating in the workflow I uploaded, albeit using H2 instead of Snowflake for demo purposes. If you are needing KNIME to perform transformations on the data though, this scenario won’t work. It requires that SQL is used for the transformations and that isn’t necessarily that easy, whereas it can be comparatively easy in KNIME. Complex queries in SQL can start to become difficult to debug, and of course require a good knowledge of SQL. The trade off is the performance if you are handling large datasets.

All of the above is trying to answer your question of whether you need the DB reader node. Basically if you are going to perform manipulations/transformations on the data in KNIME you will need to have a Reader node because until that point, the KNIME transformation nodes will not have access to the data. I hope that helps.

system · April 23, 2024, 9:52am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.