In database nodes and sql-statement against virtual tables

Hi Knime community!

We are starting out to use the KNIME DB nodes to do fairly simple database queries against “virtual”/“calculated” views in a database (SAP HANA). The flows looks something like this:

The views are fairly sensitive for good sql-statements and we find that chaining a couple of column-filters + row filters like the picture above gives SQL-statements of this form:
SELECT * FROM ( SELECT “tempTable_3320555003387273690”.“MATERIAL” FROM (SELECT *
FROM “schema”.“calc_view”) AS
“tempTable_3320555003387273690” ) AS “tempTable_5861427265251268515”
WHERE “MATERIAL” = ‘A’

Instead of something like this:
SELECT MATERIAL FROM “schema”.“calc_view” WHERE “MATERIAL” = ‘A’

What we end up with is very poor performance, since it seems like the virtual table more or less makes a SELECT * on the calc view making it assemble in memory to something very large and complex and then filter on that view afterwards.

We get much better performance out the short sql-statement, but not everyone is comfortable with writing their own SQL statements.
Is there anyway to “optimze” how KNIME makes the sql-statements now when it seems like the database cannot figure out to optimize the chained temp views that KNIME is assembling.

We are running KNIME 4.3.1 on Windows with a SAP HANA JDBC driver.

1 Like

Hi @DanSah,

how do you compare the performance of the query KNIME generates and your query?
Are you running your query in KNIME as well by using e.g. the DB Query node? Or are you running another tool then?
Just want to check where the issue could be.

Best,
Julian

Thanks for the reply Julian!
We’re basically just executing the workflow above and time it. Then we remove the DB column and row filters and just enter the query directly as a custom query in the DT Table selector hooked up to the DB reader.

So everything should be the same, the output is identical from the DB Reader, the only difference is the query sent to the database being either the autogenerated with tempTables or the straight forward one we’ve typed in our selves.

Thanks,
Daniel

Hello Daniel,
usually the DB query parser optimize the nested subqueries before execution which will end up in the same performance as the manually written query. It seems that this is not the case for the HANA query parser.
I have created a feature request to allow users to enable query flattening e.g. reducing the number of subqueries.
Bye
Tobias

6 Likes

Thanks Tobias - that would for sure make the in DB nodes more useful to us.

/Daniel

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.