In database nodes and sql-statement against virtual tables

DanSah · March 10, 2021, 3:17pm

Hi Knime community!

We are starting out to use the KNIME DB nodes to do fairly simple database queries against “virtual”/“calculated” views in a database (SAP HANA). The flows looks something like this:

The views are fairly sensitive for good sql-statements and we find that chaining a couple of column-filters + row filters like the picture above gives SQL-statements of this form:
SELECT * FROM ( SELECT “tempTable_3320555003387273690”.“MATERIAL” FROM (SELECT *
FROM “schema”.“calc_view”) AS
“tempTable_3320555003387273690” ) AS “tempTable_5861427265251268515”
WHERE “MATERIAL” = ‘A’

Instead of something like this:
SELECT MATERIAL FROM “schema”.“calc_view” WHERE “MATERIAL” = ‘A’

What we end up with is very poor performance, since it seems like the virtual table more or less makes a SELECT * on the calc view making it assemble in memory to something very large and complex and then filter on that view afterwards.

We get much better performance out the short sql-statement, but not everyone is comfortable with writing their own SQL statements.
Is there anyway to “optimze” how KNIME makes the sql-statements now when it seems like the database cannot figure out to optimize the chained temp views that KNIME is assembling.

We are running KNIME 4.3.1 on Windows with a SAP HANA JDBC driver.

julian.bunzel · March 12, 2021, 11:51am

Hi @DanSah,

how do you compare the performance of the query KNIME generates and your query?
Are you running your query in KNIME as well by using e.g. the DB Query node? Or are you running another tool then?
Just want to check where the issue could be.

Best,
Julian

DanSah · March 17, 2021, 5:24pm

Thanks for the reply Julian!
We’re basically just executing the workflow above and time it. Then we remove the DB column and row filters and just enter the query directly as a custom query in the DT Table selector hooked up to the DB reader.

So everything should be the same, the output is identical from the DB Reader, the only difference is the query sent to the database being either the autogenerated with tempTables or the straight forward one we’ve typed in our selves.

Thanks,
Daniel

tobias.koetter · March 18, 2021, 7:36am

Hello Daniel,
usually the DB query parser optimize the nested subqueries before execution which will end up in the same performance as the manually written query. It seems that this is not the case for the HANA query parser.
I have created a feature request to allow users to enable query flattening e.g. reducing the number of subqueries.
Bye
Tobias

DanSah · March 19, 2021, 8:37am

Thanks Tobias - that would for sure make the in DB nodes more useful to us.

/Daniel

system · March 26, 2021, 8:37am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

tobias.koetter · December 8, 2022, 7:37am

Hello @DanSah ,

we just release version 4.7 of the KNIME Analytics Platform which now supports query flattening (e.g. removing of unnecessary subqueries). Usually this is not required since most databases do a great job in optimizing the query prior execution. However, for some databases (especially SAP HANA) enabling this option can result in an improved execution speed.

You can enable the new Flatten sub-queries where possible option in the Advance tab of all DB connector nodes.

For an overview of all the new features have a look at the What’s new page. For all the details go to the changelog.

Bye
Tobias