Insert data in several tables of database

dbolshev · February 5, 2020, 11:17am

Hi all.

I have a question.
Let’s imagine the situation. I need to put data from JSON to postgreSQL database. Database has 2 tables and one of them has reference to another:

Table 1
id
…
some data

Table 2
id
table1_id
some data

Ids of both tables are auto incremental. So i need to add data to Table 1, get id of this record and add data and id to Table 2. I can make it with script. But can I make it with KNIME nodes?

Jeany · February 14, 2020, 1:47pm

Hi,

Thank you for your question and welcome to the community! If you provide a small simple example of the JSON input and specify how the end result should look like for this example, I can create a simple workflow that hopefully does what you want.

Best,
Jeany

dbolshev · February 14, 2020, 2:39pm

Hi,
Thanks a lot for your answer.

Lets say we have JSON to describe of purchasing some goods:

“data”: {
____“status”: “delievered”,
____“date”: “2020-01-21”,
____“id”: 127819991,
____“items”: [
_____{
______“code”: 112,
______“name”: “some name 1”
_____},
_____{
______“code”: 113,
______“name”: “some name 2”
_____}
____],
____“customer”: {
_______“name”: “John Smith”,
_______“phone”: +1 111 111 111 1
____}
}

So we need to parse this JSON to database with 3 tables: items, customers, purchasings.
IDs in tables is auto incremental.

Jeany · February 19, 2020, 5:08pm

Hi,

I attached a workflow that parses the JSON, creates two tables (items and customers, I leave purchasing to you ;)), adds the incremental IDs to each table, adds the item ID to the respective customer and writes the two tables to a database.

Hope that is more or less what you had in mind,
JeanyDB_Insert.knwf (37.1 KB)

jan_lender · February 19, 2020, 7:51pm

Hi,

I’m afraid the solution @Jeany came with will produce problems if the table’s ID is declared as auto increment as @dbolshev suggested. If you make up a custom ID ignoring auto increment you could end up with duplicate ID either immediately or in the future. You usually have an option to get last inserted ID from DB session but I haven’t noticed yet KNIME is able to work with that.

I’d consider the best solution changing the way ID is generated from auto-increment to obtaining new ID from a sequence. Sadly, since I guess author’s database is MySQL that doesn’t support sequences, I wouldn’t put money this is possible.

If we stay with auto-increment we have to deal with it. What about adding a business key, GUID to the table the other one refers. Your JSON has an ID within the root “data” object. I wouldn’t consider it safe and better generated a really global unique identifier. After you save the root entity you select the rows again to obtain ID’s DB used for saving. Now you can save the other data with proper foreign keys.

Complex but bullet-proof.

jan_lender · February 19, 2020, 9:28pm

I’ve updated the workflow @Jeany had uploaded.
DB_Insert.knwf (153.4 KB)
I made changes as follows

Multiple purchase in JSON support added
Purchase entity handled
DB Schema containing FKs and auto increment PKs
UUID (GUID) generated and used for correlation as described previously
FK constraints & reference integrity satisfied

beginner · February 20, 2020, 7:40am

First a general comment that inserting relational data eg. one-to-many really is a problem in KNIME. However with the new transaction nodes, this has become much safer and more feasible but still getting hold of generated ids would be very helpful.

I would add transaction start and end nodes to Jans workflow or else you risk the chance of data inconsistencies.

Jans idea with generating uuids and then using them to look-up the generated id is probably the easiest solution. However it requires you can and are allowed to change the table structure.

In case you can^t change the table structure, it depends what type of auto-increment you use in your PostgreSQL database.

SERIAL or IDENTITY. SERIAL will make it easier as you can simply insert data into such a column.

A postgresql serial is backed by a sequence with name “tablename_columnname_seq”. Hence what I would do, inside a transaction, is count parent records (Extract table dimensions node) and then select as many number from the underlying SERIAL sequence:

SELECT  nextval('myschema.mysequence')
  FROM  generate_series(1, numRows)

numRows = actual integer value. This will return as many ids from the sequence and increment it. You can then simply append this column to your table and insert it.

If the auto-increment is an IDENTITY column with GENERATED ALWAYS then it gets tricky because you can’t just insert the data. I think you would then have to manually loop and insert data with OVERRIDING SYSTEM VALUE (see the first link for more info). Options for getting inserted IDs and according potential issues can be seen here but it’s not straight forward.

Ideally knime would support the “returning id” clause (for databases that support it) and append it to the outgoing table.

dbolshev · March 2, 2020, 11:49am

Hello, @Jeany.

Thanks a lot for your help. It is a good idea to generate ID manually. But we need to use auto increment (serial) in DB.

dbolshev · March 2, 2020, 11:54am

Hello, @jan_lender.

Thanks a lot for your help!

dbolshev · March 2, 2020, 11:59am

Hello, @beginner

Thanks for help. I used to get ID of added row using keyword “returning id” after insert script. But DB_writer node hasn’t output port so I couldn’t get and use this “id”. With your SQL script and workflow of @jan_lender my problem resolved.

system · March 9, 2020, 11:59am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.