Exporting table to PostgreSQL database?

tone_n_tune · October 24, 2021, 6:04am

The above is my current workflow, but I am stuck on how to export the table to a PostgreSQL database. The table is clean and ready, I have successfully connected to PostgreSQL.

Now, my questions are:

How do I export the table in PostgreSQL without creating any schema or table directly in the DB (I would like to control everything from KNIME)? In doing so, I need to create Schema and Tables in KNIME. How do I create tables with column names extracted from Node 5 (in an automatic way)? I have extracted the column headers, now the DB Table Creator (Node 14) needs to use that as input.
How can I scale up this workflow that will handle dozens of sources (same input data schema, just the data itself will be different)? Can I put those sources before Node 1 and execute the workflow sequentially? In that case, the final table name will change based on the input source. So is there a way to parameterize the Table creation in #1 above?

bruno29a · October 24, 2021, 2:45pm

Hi @tone_n_tune and welcome to the Knime Community.

The DB Writer will create the table if it does not exist, and it will create it based on the input table you are giving it - so whatever table is coming from Node 5. You don’t have to run a DB Table Creator.

If the table already exists, it will just add the new records to it.

You can set the schema and table names dynamically via variables. As you said “the final table name will change based on the input source”, so you can set the schema and table name per input source, and then assign these variables to the schema and table by clicking on these buttons:

EDIT: One recommendation that I usually give is to establish your connection only just when you need it, and to do this, you just need to link your Node 5 to your Node 11, like this:

That way, Node 11 will execute only when your Node 5 is done, meaning just before you are ready write. The way you are currently set up, Knime will establish the PostgreSQL connection as soon as the workflow starts running, and if your Node 1 → Node 3 → … → Node 10 → Node 5 takes time, your connection could expire as it would remain idle during all that time, and so your DB Writer would not have an active connection.

tone_n_tune · October 25, 2021, 5:39am

@bruno29a thanks for the advice, I have placed a table creator in front of the GET Request where I have listed all the source URLs. But how do I execute the workflow “sequentially”, taking source URLs from each rowID at a time and obtaining separate tables for each source at the end? Currently, all the tables are getting appended into one table (the table with the first rowID).

Btw, Node 5 and Node 11 can not be connected together, there is no data input port for the PostgreSQL connector.

bruno29a · October 25, 2021, 1:45pm

Hi @tone_n_tune , you connect the 2 nodes via the flow variable link (you can see the red line in my screen shot), that’s how you can link 2 nodes. This makes sure that Node 11 will execute only after Node 5.

For executing “sequentially”, Knime will do that for you automatically. Knime will run the workflow for each of the rows in your Table (Node 14), and will append the results in new columns for each row.

Just make sure you’re not sending like thousands/millions of rows at a time.

If you are getting all results in the same row, this has to do with your JSON Path then.

tone_n_tune · October 26, 2021, 8:45am

@bruno29a, I am afraid I do not see any option here to create Flow Variable in Node 5, can you please share the screenshot of your config of Node 5?

No, the JSON path is putting results for different sources in different rows.

Yes, Knime is doing this now, appending new source data in the same table. I am trying to get separate tables for different sources. Any way to do that? I was thinking each workflow run will end in storing the table for that particular source to the Database and then start from the beginning of the workflow with a new source. Any way to do this? This way, I will have my data in separate tables of the DB.

Daniel_Weikert · October 26, 2021, 5:06pm

You need to go to the Flow Variables Tab (next to options) to choose or create flow variables within a node.
I think in @bruno29a 's screenshot he just connected the 2 nodes via flow variable without creating any additional ones. Just right click on a node and click “show flow variable ports”)

If you want to execute them sequentially with individual adjustments you could also try out a loop
br

tone_n_tune · October 27, 2021, 1:08pm

This worked, thank you.

Can you please post an example?

ipazin · October 27, 2021, 2:25pm

Hello @tone_n_tune,

as said you need a loop to write data into different tables in your database. Loop to use depends on your data and where you decide to place it but my guess would be that Group Loop Start node should do the trick if you have a group identifier like table name. To learn about loops in KNIME there is nice blog post about it:

And to find workflow examples (and more!) there is KNIME Hub:

Also, please don’t post into multiple topics regarding same issue. Especially really old ones

And last but not least, welcome to KNIME Community!

Br,
Ivan

tone_n_tune · October 28, 2021, 6:48am

@ipazin thank you for the link, I will check it out.

This is a separate issue, the Postgres connector can not write to the DB (the same thing is working for MS SQL Server). I am getting the same error messages as shown in that post. Please open that discussion that you have closed! The developers need to be aware of the bug.

ipazin · October 28, 2021, 10:25am

Hello @tone_n_tune,

I see. My mistake as I though it’s related to one already discussed here. However that topic is really old and uses old database framework so feel free to open new.

Br,
Ivan

tone_n_tune · October 28, 2021, 12:25pm

@ipazin I have tried your suggestion of using Group Loop Start, but it is only saving the first group in a table. How can I export all groups in the table?

Daniel_Weikert · October 28, 2021, 4:36pm

I guess you have to share your workflow. The group loop normally iterates in the groupings you defined but after the end node it should collect all results
br

ipazin · October 29, 2021, 8:39am

Hello @tone_n_tune,

honestly it’s a bit hard to help you out with the level of information you are sharing. I know you can not share your workflow and that doesn’t make sense as it features database connection but some screenshots of your current workflow together with screenshots of data and a bit more context sure would help. However let’s try one more time.

If I got it right you are trying to export data into multiple tables and you fetch all the data using Get Request node. If that is correct you need to have a loop (proper and properly implemented) that will in each iteration take part of data and write it into table that this data belongs to. In order to do so you obviously need an identifier or have a way to identify what row goes to which table. And all that to work you need to parameterize DB Writer node (using flow variable holding schema and/or table name) so in each iteration you write to different database table. The whole process is not that easy (meaning for a KNIME beginner it’s not a half an hour work) but also it should take you a week if you are familiar with ETL. Especially with assistance you are getting and materials you are reading

Br,
Ivan

tone_n_tune · October 29, 2021, 2:29pm

@bruno29a @Daniel_Weikert @ipazin here is the workflow, please have a look - https://u.pcloud.link/publink/show?code=XZ9A9OXZpKRIMruKLM82yuCyad99g00cADqy

Yes, that is exactly what I am trying to do, to write multiple tables in the DB.

Yes, it is getting quite difficult for me, please help.

bruno29a · October 29, 2021, 3:16pm

Hi @tone_n_tune ,

A few comments about your workflow:
You don’t have to link all your workflows with the flow variable port. Depending on why you did this:

Workflows will follow to the next one that’s linked to it
Flow variables also will follow

If you linked them because of my suggestion to link your connection node, please take a look at this example:

If this workflow is executed, the nodes Node 1, Node 4 and Node 5 will execute first and at the same time, and then Node 2 will follow after Node 1 is done, and then Node 3 after Node 2 is done.

Since Node 4 (like Node 1) does not have any node connected to its left, it will just execute when the workflow is run. Similarly for Node 5. That is why Node 1, Node 4, and node 5 will execute right away when the workflow is executed.

If I want to make Node 4 execute only after Node 3 is done, then I have to link Node 3 to Node 4. Since Node 4 is a Table Creator and does not have a Data input (black triangle), I can only link it via the Flow variable port (all nodes have an input and output flow variable port):

Now, when the workflow is executed, only Node 1 and Node 5 will start executing at first. Node 4 will only execute after Node 3 is done.

If we apply the same logic to your workflow, you want to establish your db connection only when you are ready to do db operations, or put another way, you want it after all your knime data operations are completed, that is at the end of your Node 5. If you establish your connection as soon as your workflow starts, you risk running into the connection being expired if your knime data operations is long (downloading data via GET Request, manipulating the data after such as Ungroup, etc).
Similarly, like the Table Creator, the DB Connector does not have any input port. So we connect it using the Flow Variable port.

I modified your workflow, and it looks like this now:

I integrated the Group Loop. It’s grouping by your table_name.

You need to do an additional modification which I cannot do because I don’t have a db connection - meaning you will only be able to do this once you run Node 16 (or any other db connection). This is regarding to dynamically pass a table name to your DB Writer so that it would write the data in different tables per group. You will only need to do this once.

Instructions:

Execute Node 17
Execute Node 16
Open the DB Writer (Node 13)
Note: The Group Loop will tell you what’s the value of the group, and this can be confirmed by looking at its Flow variables:

In the DB Writer node, click on this button:

image846×549 19 KB

This window will popup:

Check the box as per the image, and then choose “table_name” from the dropdown (That’s the part that I cannot do on my side. I don’t see the variables because I don’t have a valid connection. You should see it if you executed Step 1 and 2)
Click Ok, and you’re back to the DB Writer, and you should see this message at the bottom:

image837×368 10.7 KB
Click Ok, and all is set
You can execute the rest of the Loop by running the Loop End (Node 18)

The expected results: it will write the first set of data to table BTC and the second set of data will be written to table ETH as per what you defined in your table.

Here’s the workflow: Export_to_DB_Bruno.knwf (36.6 KB)

tone_n_tune · October 30, 2021, 6:10am

@bruno29a this worked beautifully, Thank you!

I am getting a warning message when I rerun the workflow: “WARN DB Writer 3:13 Configure node and enter a valid table name.” Anyway to resolve this?
I have replaced the MS SQL Server connector with Postgres Connector but nothing was written in the DB. I should point out that in the current workflow, there is no Schema defined (MS SQL Server is writing the tables in the existing master DB). Is that the reason for the Postgres DB not working, and how can I define a fixed Schema for the tables to be written?
Is there a way I can reset the First Node (#14) and execute the Last Node (#18) automatically every day at a certain time like 6 pm?

bruno29a · October 30, 2021, 2:08pm

Hi @tone_n_tune , don’t mind the warning message, it’s just a warning. This happens because some information is not yet available for the node at that moment. Once you run the upstream nodes, it will get the proper information and will run as expected.

For schema, if you are going to use a fixed one, you can just enter it directly in the the Schema box in the DB Writer:

If you want to run something automatically there are 2 main ways:

Via a Knime server
Or via the command line

There was a thread about this yesterday, you can follow it to get more information:

For the reset, you can save your workflow as reset, and the job will run it, or you can pass the option -reset if you are going to use the command line.

aiham30 · October 30, 2021, 4:15pm

could you please share more details and also how i can save my work flow as reset to add to windows schedule

bruno29a · October 30, 2021, 7:22pm

Hi @aiham30 , just reset your workflow and then save it. That way it will be saved as reset.

You can create a batch file that have your command line + parameters, and then create a task in your windows scheduler to execute that batch file on specific day/time.

Your computer has to be on at the time of the scheduled execution.

tone_n_tune · October 31, 2021, 1:04pm

@bruno29a I tried the Linux option, I am using Ubuntu. But Knime in Ubuntu is not an installed app, so the commands are not recognized. Any solution for this?