CDC features exit in KNIME or not by using database in tgt.

ranjeetsingh0702 · December 31, 2019, 4:26am

My aim is to upload data in oracle database by using KNIME as incremental and full load. is possible ? if yes then plz give some idea on it.

ScottF · January 2, 2020, 7:57pm

Hi @ranjeetsingh0702 -

Welcome to the forum. So if I’m understanding your question correctly, you want to know about whether Change Data Capture features exist in KNIME, and whether you can load data into Oracle?

For the first question, I suppose it depends on what it is exactly that you’re trying to do. Could you provide a bit more information on your use case?

For the second question, the KNIME DB nodes support loading and processing with Oracle, you just need the proper JDBC driver installed and configured. That’s described in our KNIME Database Extension Guide, along with some examples of how you can process data in your DB using KNIME nodes.

ranjeetsingh0702 · January 3, 2020, 12:19pm

My aim is i need to upload the data daily by incremental is it possible then how.

ranjeetsingh0702 · January 3, 2020, 12:22pm

example i have 10 new rows in one table that is loaded in TGT table. Next day i got 6 nows which i need to load in TGT in 6 rows 4 are new and 2 are updated. So how we can load these 6 rows in TGT table.

ScottF · January 3, 2020, 6:27pm

OK, thanks. Did you check the link to the Database Extension Guide? In particular, the part on DB Manipulation. I suspect much of what you’d like to do can be accomplished with these nodes.

ranjeetsingh0702 · January 5, 2020, 4:17pm

Hi ,

I did not find this thing, Could you guide how can create pipe line for insert new data in tgt and how can insert updated data in TGT from the same source data to TGT like source table is A and TBG table is B. In Table A have 4 row first loaded in table B. now on next run Table A have 6 rows 4 is updated and 2 is new how can i load these in TGT.

beginner · January 6, 2020, 6:04am

KNIME can’t do “real CDC” with real CDC meaning what the professional tools do (reading database transaction logs). As ScottF said I’m sure it’s possible to hack something together IF you add appropriate columns to your database and manage them correctly. usually for “fake” CDC you would use a date created and date modified column (set/updated by a trigger). In your knime workflow you just store the last date you ran your workflow and on next run load all new and updated rows into the workflow and do your processing.

Of course you will need to test this a lot and I fear that such an approach in a database with lots of changes will have “race conditions” were certain changes get lost if not extreme care is taken (for example which servers time is used). So depending on the load on the database and the importance of consistency of the result I would probably rather invest in a real CDC tool (yes they are costly).
Another alternative depending on your data set size is to simply rewrite the target completely in a set interval that is larger than the processing takes. Depending on the processing needs this could easily works to millions of rows.

But again if this is lots of data with frequent changes and changes needing to be visible quickly in target then I fear you won’t get around a real CDC tool (Oracle Goldengate, talend,…)

jeffgullick-knime · January 6, 2020, 1:48pm

I second what @beginner stated and what @ScottF mentioned earlier in the thread.

KNIME is not a CDC engine/application. With that being said it can play a role in a CDC solution if certain criteria are in place. Timestamps of change or creation , a column that indicates a record has been updated, or maybe another table (transaction or log table) that can be referenced to indicate a new or changed record.

Although the processing can be expensive or inefficient, scanning the target table for keys and/or timestamps to then compare to the source table can be done. Not my first choice, but I have seen this method used for certain cases.

system · July 7, 2020, 1:48am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.