How to update database with scraped data

RIchardC · December 29, 2021, 4:30am

I have a table with these columns: Person_Name, Email, Latest_Post_Title

Every day I scrape new records that include Email, Latest_Post_Title

If a scraped email matches an existing one in my table, I want to update the Latest_Post_Title for that row with the scraped Latest_Post_Title

If a scraped email doesn’t match an existing one in my table, I want to create a new row for it.

I’ve been experimenting with a Full Outer Join, but I question if that’s the best way to do this. (There are actually more columns than described in my simplified example.)

I would appreciate any ideas. Cheers, Richard

mlauber71 · December 29, 2021, 11:48am

@RIchardC I have an example here doing an update and merge in an H2. The functions should mostly be the same for all databases.

mlauber71 · December 29, 2021, 4:54pm

@RIchardC I put together a (hopefully) complete example using H2 and the Northwind database with customerid.

Here we first create a database with a customerid as a primary key (randomly selected from the central DB). Then every 10 seconds another random batch gets drawn. Existing customerid will be updated. Then the workflow determines which customerid are new and would insert them. If a row is inserted initially a timestamp first_inserted will be stored. Then with every update another timestamp last_updated markt the time and update has been performed.

Maybe you can take that example and work with that.

RIchardC · December 30, 2021, 4:13pm

This is absolutely the right way to go. I ended up using an SQLite db and a DBMerge node. Works great.

system · January 6, 2022, 4:13pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.