Catch Errors does not catch incoming DB queries in DB Table Selector

JaeHwanChoi · November 4, 2023, 7:31am

Hello KNIME Support Team.

I have a question about a WF I created.

First of all, the picture is reading data from the DB using the Try & Catch Errors node, which reads the query from the DB into the Flow variable.

It disconnects from the DB if the query works or if an invalid query comes in.

The problem with implementing this WF is that the

when there is no DB SQL Executor node, when incorrect query information comes in, an error occurs in the DB Table Selector and Catch Errors should end successfully, but only a warning appears, and Catch Errors is not executed, so the DB cannot be closed.

Therefore, if I use the DB SQL Executor node, Catch Errors works normally even if an incorrect query comes in, so can I use DB SQL Executor?

If a large amount of data is loaded as a query, will the above DB to Spark be implemented if Catch Errors is completed and the DB is closed first?

I would like to test it, but I don’t have big data in DB and it’s a customer environment, so I can’t run it locally. I would like to ask the supporters’ opinions because it is impossible to implement it on the server because the data does not exist.

I would be grateful if you could answer.

takbb · November 4, 2023, 9:11am

Hi @JaeHwanChoi ,

I haven’t used DB to Spark, so I cannot comment on that side of things, but even so I have some questions, and I suspect others will too, so maybe you can expand on a few things.

What is the purpose of the DB SQL Executor? i.e. what SQL is it executing?
What do you mean by “when incorrect query information comes in” and “incorrect query comes in”.
a) How does a query get to be “incorrect”?
b) How do you identify “incorrect query information”?
What is the nature of the “warning” you mention in (1)

In response to your question (2) “will the above DB to Spark be implemented if Catch Errors is completed and the DB is closed first?”, I’m pretty sure that if the DB connection gets closed, then any node that tries to receive data from a brown port will fail, because the query only executes in the database at the end of a chain of brown in-database ports.

I would think you would therefore want to have your DB Connection Closer occurring after the DB to Spark, and not just when a failure occurs. Do that by adding a flow port from DB to Spark to your DB Connection Closer instead of attaching it to the Catch Errors node.

By the way, whether an error is caught or not, BOTH ports on the Catch Errors node will still be active. The flow variable port indicates the nature of the error (if any), but it is still active even if there is no error so it is up to you to add additional logic to determine if an error actually occurred.

takbb · November 4, 2023, 10:02am

@JaeHwanChoi , I agree with your general observation about the ability to catch an actual SQL error though.

The behaviour of KNIME in this regard does appear inconsistent, as demonstrated by the following workflow:

Here, the equivalent process is attempted via in-database DB Table Selector, and via an (out of database) DB Query Reader. In both cases the SQL is invalid and attempts to read from a table that doesn’t exist. But in one case, this is treated as a catchable error, and in the other it isn’t.

In light of this information, some of my earlier comments probably require elaboration. Where I talk of an error being “caught” or “not”, I’m referring to a situation where the Catch Errors node is not blocked regardless of the error state, but instead proceeds normally and handles any errors. This is the situation that occurs with the DB Query Reader example, and I had assumed would also be the case with the DB Table Selector. However I see that this is not the case.

With the In-database (brown port) nodes, it appears that invalid SQL is treated as a configuration error, rather than an execution error and so is not trapped (i.e. the workflow is blocked), which is the situation I believe you find yourself in.

I don’t currently have an answer for that other than to question how you come to have a sql query that is invalid, and is there something you can do upstream to prevent that?

Inconsistent DB Error trapping.knwf (40.2 KB)

(KNIME AP 4.7 and KNIME AP 5.1.2)

JaeHwanChoi · November 5, 2023, 5:25am

Thank you very much for your answer. @takbb

The current situation is that when the user creates a query on the web, we receive the query created through the API as a flow variable. Therefore, we set the flow variable in the Custom query of the DB Table Selector.

As you said, when the user creates a query on the web, if the user enters a typo or enters a table that does not exist in the actual DB, it is recognized as incorrect information and an error occurs.

The reason why I used DB SQL Executor is because I used this node so that Catch Errors(Generic Ports) can catch the error and close the DB successfully.

In addition, the reason why I cannot use Catch Errors(Data Ports) in your example is that currently I have to use Spark node for data analysis, and I need to convert the incoming data to DB to Spark immediately. Since it is big data, I can’t use Table to Spark.

As a result, Catch Errors are not written because it is a configuration error, not an execution error, and to solve this, I need to receive only queries that have been checked first on the web or use a DB SQL Executor node in front of it.

Also, is the configuration error a defect of the node, and can I use the DB SQL Executor node in this situation?

Your answers will be greatly appreciated.

JaeHwanChoi · November 5, 2023, 5:40am

We’ve answered those questions in the comments below, but we have additional questions.

“Do that by adding a flow port from DB to Spark to your DB Connection Closer instead of attaching it to the Catch Errors node.” So, if I do DB to Spark and then close the DB, the data is in the Spark repository and the subsequent operations are done by fetching the data from the Spark repository, right?

takbb · November 5, 2023, 9:24am

Hi @JaeHwanChoi , thanks for the additional information and background. I can now more clearly understand the problem you face, and your reason for using DB SQL Executor.

Well, I cannot say if it is a defect in the node but certainly to me its behaviour is both unexpected and unhelpful, but I don’t know if it is the intended behaviour. There may be an underlying technical reason why for this node an invalid query had to be considered a configuration error, but it is inconsistent with the behaviour of DB Query Reader, and DB SQL Executor, and such inconsistency to me puts it in “defect” territory. If that question doesn’t get picked up by a member of the KNIME team here, I’ll post my above simplified example as a separate question.

That’s fine. My example was merely to show the difference in behaviours between the two DB nodes. It shouldn’t matter which “catch” nodes you are using.

OK to summarise (this is actually me thinking out loud to try to formulate a plan )

You have a custom query that may or may not be valid syntax
DB SQL Executor is being used to test the query as a workaround to a sql error in DB Query being “untrappable”
If the query is invalid SQL, the error should caught, and:
3a. There is nothing to write to Spark (except send an empty default table?)
3b. The db connection should be closed
If the query is valid SQL:
4a. The data should be sent to Spark
4b. The db connection should be closed
Spark should still be accessible afterwards regardless of (3) or (4)

So if the above summary is correct, the connection should be closed (3b) and (4b) regardless of the error state.

I think you should be able to join the DB Connection Closer as follows:

The lower flow port on the Catch Errors node can be ignored (those flow variables are available via the generic port too, so you can still access them downstream if you want). In the above, the DB Connection Closer will occur after DB to Spark, and yes I would expect that once DB to Spark has converted your database data into a Spark Dataframe, the DB Connection would no longer be required.

btw, I’ve used the “NoOp” node as a convenient point for connecting the DB Connection Closer flow variable back into the flow to make sure it is executed. It might not be necessary to link it back, but I tend to do so just to be sure. The NoOp nodes are a useful set of nodes that do nothing (NoOperation) other than provide a means of connecting, and are available from NodePit. You don’t need to use NoOp for this, if you can just connect the flow variable from DB Connection Closer to something else downstream of DB to Spark.

I stress though that I haven’t personally used DB to Spark (I have just downloaded it so I could attach a node in the screenshot ), so it is possible I am wrong about the dataframes and db disconnection, but I’m reasonably confident based on how KNIME handles other situations.

JaeHwanChoi · November 6, 2023, 9:18am

Thank you very much for your kind reply. @takbb

I am in a situation where it is mandatory to use DB SQL Executor,

If a large amount of data comes in as an error-free query, I’m worried that DB SQL Executor and DB Table Selector will be loaded with the large amount of data and it will take twice the time.

Also, I’m curious about the difference between a configuration error and an execution error.

Your answers would be appreciated.

danielesser · November 6, 2023, 10:43am

Just a quick heads-up to avoid confusion. We re-branded the mentioned NoOp Nodes as NodePit Power Nodes. They contain the noop functionality (no changes here) as well as the new node and workflow annotator nodes and some additional utility nodes.

@takbb Update site and feature identifier hasn’t changed, so you don’t need to change anything to get the latest updates.

Best regards,
Daniel

takbb · November 6, 2023, 11:13am

Hi @JaeHwanChoi , yes I feel your pain having to do a two-stage process of DB SQL Executor followed by DB Table Selector to enable error checking.

In the DB SQL Executor, are you simply executing the query as supplied by the user? You should be able to cut it down to be a small overhead with either 1 row or even zero rows, rather than double time if you can “wrap” the query with an outer select, which should still then make the query testable.

e.g. if the user supplies:

select x,y,z from my table where something=c

you could wrap it with something like:

select * from (

      select x,y,z from my table where something=c

) fetch first 1 row only

or

select * from (

      select x,y,z from my table where something=c

) where 1 = 0

This should greatly reduce the execution time for the DB SQL Executor.

Typically in KNIME, a configuration error is a mis-configuration of a node that would prevent it from being able to execute. This can still occur at execution-time if the configuration is being dynamically set by flow variables. For example a String Manipulation node containing invalid syntax in a statement passed into it via a flow variable would still be a configuration error. Such errors block the execution of the node and cannot therefore be caught be error-handling. You can think of this as being like a compilation error in a traditional programming language

An execution error on the other hand would be something that allows the execution of a node to occur, but results in a run-time error occurring which can then be handled by try-catch. In a traditional programming language this could be a “file not found” or “division by zero” error for example.

With KNIME it isn’t always clear without experimentation what will be considered an error and what won’t. For example in Math Formula, division by zero results in a “missing” value rather than an error that can be caught. In Java snippet, an error/exception that is thrown other than an “Abort” results in the happy continuation of the workflow as if nothing had happened. In Python, meanwhile, throwing a runtime exception halts the node (I haven’t experimented to see if that one can be caught by a catch-node).

It is difficult to decide whether an invalid SQL statement should be treated as a configuration error or an execution/runtime error. There are a wide variety of error types with SQL, and the validity of a statement such as “select x from mytable” when written as a custom-query cannot be determined at configuration time because “mytable” may or may not exist at that point, and so I feel it should be considered a runtime/execution error, just as it is with DB Query Reader and DB SQL Executor, and handled as such. That there is contradictory behaviour between what appear to be similar situations to me is a problem. I would prefer that an invalid query were always treated as a catchable error.

But that is only my take on it, and there may be other underlying technical reasons why with that specific node, and its potential interactions with other downstream in-database nodes, it has to be considered a non-catchable error.

takbb · November 6, 2023, 11:21am

Thanks for the info @danielesser

JaeHwanChoi · November 6, 2023, 11:42am

Thank you very much for your kind reply. @takbb

Unfortunately, we don’t get test queries like 1 or 0 rows, we get user-written query statements through the API, so there’s nothing I can adjust within DB SQL Executor.

For now, I guess the best I can do is to read the big data twice when it comes in.

Thanks for the explanation about misconfiguration and execution errors, I think I understand it well enough.

Thank you for your response.

takbb · November 6, 2023, 11:53am

Hi @JaeHwanChoi ,

I’m not quite sure I follow. Presumably the user written query through the api is an SQL select statement, as that is what you are passing to the DB SQL Executor?

So if you have a flow variable named “user-custom-sql” containing that SQL statement, then your DB SQL Executor can be configured with:

select * from (
   $${Suser-custom-sql}$$
) where 1=0

This would then test the query, but return 0 rows.

system · February 4, 2024, 11:54am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.