I have a question.
Let’s imagine the situation. I need to put data from JSON to postgreSQL database. Database has 2 tables and one of them has reference to another:
Table 1
id
…
some data
Table 2
id
table1_id
some data
Ids of both tables are auto incremental. So i need to add data to Table 1, get id of this record and add data and id to Table 2. I can make it with script. But can I make it with KNIME nodes?
Thank you for your question and welcome to the community! If you provide a small simple example of the JSON input and specify how the end result should look like for this example, I can create a simple workflow that hopefully does what you want.
I attached a workflow that parses the JSON, creates two tables (items and customers, I leave purchasing to you ;)), adds the incremental IDs to each table, adds the item ID to the respective customer and writes the two tables to a database.
Hope that is more or less what you had in mind,
JeanyDB_Insert.knwf (37.1 KB)
I’m afraid the solution @Jeany came with will produce problems if the table’s ID is declared as auto increment as @dbolshev suggested. If you make up a custom ID ignoring auto increment you could end up with duplicate ID either immediately or in the future. You usually have an option to get last inserted ID from DB session but I haven’t noticed yet KNIME is able to work with that.
I’d consider the best solution changing the way ID is generated from auto-increment to obtaining new ID from a sequence. Sadly, since I guess author’s database is MySQL that doesn’t support sequences, I wouldn’t put money this is possible.
If we stay with auto-increment we have to deal with it. What about adding a business key, GUID to the table the other one refers. Your JSON has an ID within the root “data” object. I wouldn’t consider it safe and better generated a really global unique identifier. After you save the root entity you select the rows again to obtain ID’s DB used for saving. Now you can save the other data with proper foreign keys.
First a general comment that inserting relational data eg. one-to-many really is a problem in KNIME. However with the new transaction nodes, this has become much safer and more feasible but still getting hold of generated ids would be very helpful.
I would add transaction start and end nodes to Jans workflow or else you risk the chance of data inconsistencies.
Jans idea with generating uuids and then using them to look-up the generated id is probably the easiest solution. However it requires you can and are allowed to change the table structure.
In case you can^t change the table structure, it depends what type of auto-increment you use in your PostgreSQL database.
SERIAL or IDENTITY. SERIAL will make it easier as you can simply insert data into such a column.
A postgresql serial is backed by a sequence with name “tablename_columnname_seq”. Hence what I would do, inside a transaction, is count parent records (Extract table dimensions node) and then select as many number from the underlying SERIAL sequence:
SELECT nextval('myschema.mysequence')
FROM generate_series(1, numRows)
numRows = actual integer value. This will return as many ids from the sequence and increment it. You can then simply append this column to your table and insert it.
If the auto-increment is an IDENTITY column with GENERATED ALWAYS then it gets tricky because you can’t just insert the data. I think you would then have to manually loop and insert data with OVERRIDING SYSTEM VALUE (see the first link for more info). Options for getting inserted IDs and according potential issues can be seen here but it’s not straight forward.
Ideally knime would support the “returning id” clause (for databases that support it) and append it to the outgoing table.
Thanks for help. I used to get ID of added row using keyword “returning id” after insert script. But DB_writer node hasn’t output port so I couldn’t get and use this “id”. With your SQL script and workflow of @jan_lender my problem resolved.