ETL in KNIME + Relational Database

odin · December 16, 2012, 6:17pm

Hi,

Recently I have been trying to update my database using KNIME - problem is with default nodes for dba interaction - if DB has foreign keys etc we are not able to update database.

Is there any nodes/future implementation in plan to allow us to image logical structure of the database (with foreign keys etc) and then update it through KNIME?

Because right now we are only able to update tables that are not using any foreign keys. This would be extremely useful with any ETL related tasks.

Thanks in advance.

gabriel · December 17, 2012, 2:08pm

I guess you already found the new Database Update node (new in KNIME 2.7). Do you see an error at this node when running an update on tables with foreign keys or does the table(s) result in a inconsistent state)? Btw, the node also provides some more details (as an additional column) at the output which might be helpful when processing the data further. We have plans extending the database in terms of usability and SQL building, connection handling and more advanced options in all database nodes in order to modify the currently auto-generated statement. Another hidden feature is actually that you can enter arbitrary SQL code before the SELECT statement in all Reader and the Databse Query node, just in case you want to enter "free" SQL.

odin · December 17, 2012, 3:45pm

Thanks! I will test this solution but it already look very promising!

Ellert_van_Koperen · December 17, 2012, 5:28pm

Before Knime 2.7 we needed another way to get updates properly implemented on a relational database, though it requires a bit of tweaking: insert the changes into a specially crafted 'worker table' that has a simple on-create trigger, in the trigger the changes are forwarded to the 'real' target table.
It is up to you if you want to check if the change you make is on an existing record etc, though it seems to be good practice obviously :-)

The records in this working table can be thrown away immediately afterwards, or later if you are not in a hurry. (i truncate the table before every batch of inserts).

I use this aproach amongst others in a large table containing many (milions) of chemical compounds, in Knime i do some property calculations and put the results back in chunks of 10.000.

Here is the (mysql) trigger code, enjoy !

DELIMITER $$

USE `molecules_databases`$$

CREATE
TRIGGER `molecules_database`.`bi_cpp`
BEFORE INSERT ON `molecules_databases`.`calculated_properties_pump`
FOR EACH ROW
BEGIN
UPDATE ALL_COMPOUNDS t SET t.Core_MW = NEW.Core_MW, t.Core_HBD = NEW.Core_HBD, t.Core_HBA = NEW.Core_HBA, t.Core_TPSA = NEW.Core_TPSA, t.Core_RBC = NEW.Core_RBC, t.Core_RingC = NEW.Core_RingC, t.Core_LogP = NEW.Core_LogP, t.Core_ChiralCC = NEW.Core_CCC, t.Core_BABS = NEW.Core_BABS WHERE t.cd_id = NEW.cd_id;
END$$

odin · December 18, 2012, 9:15am

Nice idea btw. Thanks!