I am building simple workflows (see screenshot) that connects to a PostgreSQL db, pulls a ‘custom query’ (rather than a table) uses the DB Reader node and then publishes the source to Tableau via a hyperfile.
What I noticed is that the performance isn’t optimal as it takes 15-20 minutes to run the workflow. My assumption is that I am not using the ‘in-db’ nodes which utilise the db engine processing power rather than my local machine’s power. I couldn’t figure out which nodes I should be using instead of the onces in the workflow.
What I’ve done:
- Tried running the ‘DB Reader’ - off of ‘Cache tables in memory’ or ‘Write tables to disc’
- Increase knime.ini memory usage to 85% of the memory capacity of the machine
Could you refer me to what I could do to improve performance here?
you can recognize nodes that use “in-db” processing by their’s input and output ports which are both DB Data ports (dark brown color). Here are some node examples:
But if DB Reader node is bottleneck not sure db-in processing nodes can help here as eventually you will have to read into KNIME same amount of data which should take same execution time. What are table dimensions of data being read into KNIME?
I believe DB Reader is the bottle-neck, observing the workflow, it seems this node runs 90%+ of the runtime.
Is there a way to bypass the nodes (i.e not to have it at all)?
The first note is the connector to the db, the second uses a heavy custom-sql query to manipulate the data and create a data model, DB Reader I assume passes that data through KNIME and then ultimately the last node pushes the data to Tableau.
I would assume that if I built this WF from KNIME nodes completely, the runtime might decrease significantly but I want to avoid that at the moment.
Any ideas of how to decrease the run-time?
Well, the DB Reader is copying the result of your custom query into KNIME and then you are pushing back to Tableau Server, without any further elaboration.
I suggest, in this scenario, to write a new table in you db, as result of your query, and refresh the extract directly from tableau server reading this new table. You’ll see a great reduction of time.
That is a definite approach I will be implementing a bit later.
I was in the meantime looking to investigate whether there are ways to optimise the workflow in the current format/approach.
I would be happy to elaborate on the current implementation but not sure what could help.
The SQL runs Windows Functions - dynamic rolling sum/count on a very heavy table.
What information can I add for you to have a better idea of steps forward?
As long as you don’t do any kind of data prep in KNIME I agree with Luca that it’s way better to let Tableau Server directly connect to the DB (Either via published Data source or via normal connection)
Yes, for sure. I completely agree.
Data prep is done in KNIME within the contents of the SQL custom query.
A direct connection to the DB via Tableau will hang IMO.
So essentially if I break the workflow into In-DB nodes from KNIME it will run quicker than the current workflow (via custom query) ?
Nonono… hold on a minute, you are a bit confused.
Let me explain what is going on under the hood:
Your heavy SQL query is running on your DB, so you are leveraging the calculation power of that DB.
Using Knime in-DB nodes actually do the same job, so really no use here.
The output of your query must be materialized into a table in order to enable Tableau Server to import the data directly without passing trought Knime, because is not necessary and also quite slow.
How big is the table of your query? How many rows and columns?
Nothing is preventing you to connecting Tableau in Live mode, testing the perfomance… if you are not satisfied with them, you can tell tableau server to import the data (make an extract) so the perfomance will not be an issue anymore
Good post Luca,
beside checking if tableau live connection is fast. It is always slow ^^
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.