Returning auto generated IDs in DB insert/writer nodes

armingrudd · July 28, 2021, 10:05am

Hi,

When we insert new rows to the database and the IDs are auto generated in DB, how can we have the IDs of the rows we just added?

Actually, I’m requesting for a feature in DB Insert/writer nodes to return IDs of the inserted rows in the output data port.

ipazin · July 28, 2021, 12:37pm

Hello @armingrudd!

There is now a special Category for these kind of requests - Feedback & Ideas. One can tell you are not a frequent Forum visitor anymore

Regarding request. Not a DB expert but seems this would require DB reading operation in these nodes. Plus you can use DB Writer/Insert with subset of columns from DB table you are writing to so if you only want auto generated column(s) where to specify that

And what exactly is the use case behind it?

Br,
Ivan

bruno29a · July 28, 2021, 1:07pm

In all seriousness though, I think the challenge would be that:

If we rely on the DB system to get the ID of the Insert, different DB systems do it differently, + I believe this would apply only to the last record inserted - by the current session that is, so it this method would not necessarily work if we do bulk/batch inserts.
The node would have to determine what’s the auto generated ID field to start with. This is sometimes not obvious. I know in Postgres for example, we had to use a sequence table to generate the ID - the query would have a function to retrieve the next sequence number.

I am sure there are other challenges.

@armingrudd , as per @ipazin last question, what would be the use case behind it?

armingrudd · July 28, 2021, 2:21pm

Haha, yeah, and great to have this category in forum.

There are lots of them when table IDs are auto generated like auto increment IDs or DB side generated UUIDs. E.g. I have a workflow in which I’m generating a client invoice. I calculate all the values and insert them to the desired DB table which is for example client-invoice. This table generates a UUID automatically as table ID (primary key) which we use as invoice ID as well. Now, I want to generate a PDF file based on the invoice information I have and I need to add the invoice ID to the file. How can I be sure the last row in the table is the ID for the insert I did? specially when this workflow becomes a service and records can be inserted any time.

Another scenario would be something like what KNIME Server is doing. We create a job and then get the job ID and then execute the job using the job ID. Now imagine we have workflows and processes developed and executed by different tools (KNIME, Lambda Python , …). We want to have a central control system which records all the jobs (execution of the processes). We design a workflow which will be a web service in KNIME Server. In our front-end app, when a user selects a process to be run, the app first sends a request to our service with the given parameters, then receives a job ID which is then used in all workflows to update their status.
Here our web service (the workflow) takes the parameters from the front-end app, records them in a table e.g. named “jobs” and take the row ID (which is auto increment/generated) and sends it back as the job ID.
The same goes for creating employee payrolls and … where we have tables with auto increment/generated IDs which is very common.

Yes and no. If we are talking about primary keys, it shouldn’t be necessary to specify them. But if the auto generated field is not a primary key (which is not the case here for me), yes, user should specify them.

armingrudd · July 28, 2021, 2:57pm

Actually if the nodes return the primary key, we don’t need to return anything else since we can query the table and find our non-primary key auto generated value using the primary key we have.
So the only issue here is to have the primary key back when we insert records.

bruno29a · July 28, 2021, 3:29pm

Well, we have to look at the worst case scenario, as usually if the worst case scenario is taken care of, it usually automatically takes care of the best scenario at the same time, whereas the other way is more often NOT true, that is taking care of the best case scenario usually does not take care of the best case scenario.

And also, what if the primary key is a composite key (that is not just on 1 field, but composed of multiple fields)? What should Knime return? All the fields that form the composite key? (In this case, not all fields would be auto generated, but rather inserted by the INSERT query)

EDIT: Don’t get me wrong, this feature would be a great addition. I’m just asking these questions to see what kind of possibilities can happen and what’s the expected behaviour (and also, these questions/scenarios and answers could help the Knime team should they decide to implement this)

armingrudd · July 28, 2021, 3:59pm

As I said in the last reply, having primary key would be enough since we can use it to find any other auto generated value.

Yes, it should return all primary keys if it doesn’t exist already. The missing primary keys can be appended to the output data table of the nodes.

Of course, Thank you for taking the time.

ipazin · July 29, 2021, 10:17am

Hello there,

tnx for additional explanation @armingrudd and @bruno29a’s contribution as usual. Seems like a valid use case to me. However let’s wait from someone with more knowledge in this area to join and share it’s view on this matter.

Br,
Ivan

ipazin · September 1, 2021, 10:34am

Hello @armingrudd,

checked this a bit more and there is actually an existing ticket for this (Internal reference: AP-6693) where your request is now noted. In case of any news someone will update this topic.

Br,
Ivan

system · March 2, 2022, 10:35pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.