PostgreSQL DB Table Selector Stalls for large database

Hi KNIME,

We have a relatively large database: ~5.000 schemas and ~100.000 tables.
When using Select a table in DB Table Selector node, the Database Metadata Browser stalls on the message “Fetching metadata…”

Metadata browsing for such a large database is slow. For other tools like DBeaver Database Manager,
it’s possible to set Schema filtering in connection settings, so database navigation is easier.

For this issue, a temporary work-around has been created.
A Component node that discover the Schemas available for the user and selection. A subsequent Component node that discover tables and views in the schema.

Example of schema discovery

PostgreSQL: Documentation: 17: 35.46. schemata

-- List user schemas
SELECT schema_name as "schema"
FROM information_schema.schemata
order by schema_name

And for tables+views

select * from (
select pt.schemaname, pt.tablename as "tableview", pt.tableowner as "owner", 'table' as "type"
from pg_tables pt
where schemaname='$${Sschema}$$'
union 
select pv.schemaname, pv.viewname as "tableview", pv.viewowner as "owner", 'view' as "type"
from pg_views pv
where schemaname='$${Sschema}$$'
) as tv
order by tv.type, tv.tableview

Having this information, the user can then use “DB Table Selector” without the “Select a table” button.

image

This “necessecity” is not easy to explain in “On-boarding” sessions with our new KNIME users.

Would it be possible to make a configuration option in PostgreSQL connector or DB Table Selector, that only show the Schemas/Tables/Views available to the user?

Have you tried disabling the metadata retrieval? Or remove the type of item being retrieved?

1 Like

Seems like, we might carry around an older node version

After 15 min of waiting, I gave up

select version() give PostgreSQL 13.7

Increasing to log level DEBUG

A few seconds in Schema Browser before cancel

DEBUG NodeContainerEditPart            PostgreSQL Connector 3:1266 (EXECUTED)
DEBUG NodeContainerEditPart            DB Table Selector 3:6 (CONFIGURED)
DEBUG DB Table Selector    3:6        Acquiring connection.
DEBUG DB Table Selector    3:6        The connection has been relinquished.
DEBUG DB Table Selector    3:6        The managed connection has been closed.
DEBUG DB Table Selector    3:6        The transaction managing connection has been closed.
DEBUG DB Table Selector    3:6        Acquiring connection.
WARN  DB Table Selector    3:6        27 statements have not been closed before the closure of the managed connection.
DEBUG DB Table Selector    3:6        The connection has been relinquished.
DEBUG DB Table Selector    3:6        The managed connection has been closed.
DEBUG DB Table Selector    3:6        The transaction managing connection has been closed.
DEBUG DB Table Selector    3:6        Metadata retrieval: ProgressMonitor has been canceled.

Around 1 min waiting

DEBUG DB Table Selector    3:6        Acquiring connection.
DEBUG DB Table Selector    3:6        The connection has been relinquished.
DEBUG DB Table Selector    3:6        The managed connection has been closed.
DEBUG DB Table Selector    3:6        The transaction managing connection has been closed.
DEBUG DB Table Selector    3:6        Acquiring connection.
WARN  DB Table Selector    3:6        231 statements have not been closed before the closure of the managed connection.
DEBUG DB Table Selector    3:6        The connection has been relinquished.
DEBUG DB Table Selector    3:6        The managed connection has been closed.
DEBUG DB Table Selector    3:6        The transaction managing connection has been closed.
DEBUG DB Table Selector    3:6        Metadata retrieval: ProgressMonitor has been canceled.

That’s pretty many statements?

The idea is not to retrieve the meta data. I know this problem from Big Data environments. Maybe remove the TABLE and VIEW from the settings or try to use a newer KNIME version.

How does KNIME collect the meta data?
By looping over all schemas and call metadata query for table, view and materialized view?

Then, that’s easily 2000 to 5000 queries, depending if schema list is for all schemas or only schemas available to the user.

If it’s looping, that might explain it.

Would it be possible to get the raw sql statements?
Then it would try to see the execution plans or if any table locks kick in.

@tescnovonesis if you know the schema and the table you should be able to connect without calling the meta data. You could also use the custom SQL code in the connector.

And again: in your screenshot “Retrieve in configure” is active - you might try to switch it off.

Also from the log there seems to be a timeout setting. Either in the connector or in the database itself.

Hi @mlauber71

In the post, I explain that a circumvention for this issue, is to have two custom made Component nodes. One that explores schemas schemas available to the user and then a following that explores tables+views for the selected schema.
Having this information, it’s possible to use the DB Table Selector node, without opening the schema browser.

It did not change anything, changing “Retrieve in configure”, the schema browser is broken for large combinations of schemas and table availability.

Explaining this to new users: “Normally you can use the Schema browser to find the schema and table of your interest. But KNIME has a bug, where it cannot handle large sets of schemas and tables available to the user, so you have to use a workaround”, it’s just not very … trustworthy…

So, can we try to locate the issue - thank you?
And not fiddle around with workarounds.

Not sure if this is a bug but just as a general remark: I have done more workarounds with all sorts of data and environments than I can count over the years.

It might be a lack of understanding English on my side, but a sense of entitlement directed at a user trying to help might not always be the path to get more support.

Leaving that aside. @tobias.koetter is there a chance of improving the meta data browser or is this the way to go? It might involve some caching. I also experienced this with Cloudera - but there it might be just how this system works.