Which Node to Use

RRInToronto · March 4, 2024, 2:32am

Hello,

I have used several nodes to shape my data from a CSV data file.
The final node is a “Column Combiner” and my data looks perfect:
KnimeShot1

I would like to insert that data into a table in a Postgres table, row by row.
I know SQL perhaps too well, so I would like to write my own SQL statement to insert into the Postgres table.

I have written this insert statement:
INSERT INTO Cities (
CSDdguid
, Name
, CSDtype
, Population2021
, GeoLocation)
VALUES (
$CSDdguid$
, $Name$
, $CSDtype$
, $Population2021$
, $GeoLocation$
);

It appears to me that there are two nodes that support an SQL Insert statement: “DB SQL Executor” and “DB Looping”.

It seems I cannot use the “DB SQL Executor” because the control can only be associated with a database. I can’t set the Input Port to the Column Combiner. The node supports flow variables, of which I know nothing.

Can I use the “DB Looping” node? It appears to let me iterate through each row from the previous node, executing an insert statement against the Postgres table. However, I cannot link my “Postgres SQL Connector” connection to the “DB Looping” node. When I drag and drop the line from the db node to the looping node, it does not connect.

Any tips/suggestions most appreciated.

Richard Rogers

mlauber71 · March 4, 2024, 5:26am

@RRInToronto welcome to the KNIME forum. I think there DB Insert node would be good (or DB Loader for bulk operations).

If you want to read some more about KNIME and SQL databases

RRInToronto · March 5, 2024, 2:52am

Thank you very much for your response. There are two problems:

When trying to configure the “DB Insert” node, I receive this message:
As far as I understand, “DB Insert” node does allow me to write my own custom SQL insert statement.

I am trying to execute this insert statement for each row of the previous “Column Combiner” node:
INSERT INTO Cities (
CSDdguid
, Name
, CSDtype
, Population2021
, GeoLocation)
VALUES (
current_time::varchar(16)
, current_time::varchar(16)
, ‘TP’
, 90210
, ‘-92.723491,49.800732’
);

Thank you Markus.

Richard Rogers

mlauber71 · March 5, 2024, 6:03am

@RRInToronto in this case you can use the loop

And then write the code in the SQL executor using the flow variables.

https://docs.knime.com/latest/analytics_platform_flow_control_guide/index.html#introduction

RRInToronto · March 13, 2024, 9:04pm

Thank you very much for your assistance.

I wish I could attach the workflow, it’s very simple with only 8 nodes on it:

PostgreSQL connector
CSVReader
RowFilter
ColumnRenamer
ColumnResorter
ColumnCombiner
TableRowToVariableLoopStart
DBSQLExecutor

It is inserting one row, although I’m not using any of the flow variables.
My insert statement:
INSERT INTO Cities
(
CSDdguid
, Name
, CSDtype
, Population2021
, GeoLocation
)
VALUES
(
current_time::varchar(16)
, current_time::varchar(16)
, ‘TP’
, 90210
, ‘-92.723491,49.800732’
);

I don’t know why it’s only inserting one row, as I’ve got over ninety rows on port 0, out of the ColumnCombiner, I can see them in the node monitor. The ColumnCombiner goes directly into the TableRowToVariableLoopStart.
I’ve tried using the flow variables, one at a time, I’m not sure what I’m doing wrong.

Any assistance/pointers most appreciated.

Richard

mlauber71 · March 13, 2024, 11:16pm

@RRInToronto your statement inserts exactly one line into the database.

I am not sure why you want to do this in iterations but you will have to fill the values of the insert statement with the Flow Variables of each iteration.

Another option could be to use

Maybe you start by planning out what you want to do.

RRInToronto · March 13, 2024, 11:49pm

I’ve done that:

The final node is a “Column Combiner” and my data looks perfect:
KnimeShot1

I would like to insert that data into a table in a Postgres table, row by row.

I know SQL very well, so I’ve written this insert statement:

INSERT INTO Cities (
CSDdguid
, Name
, CSDtype
, Population2021
, GeoLocation)
VALUES (
$CSDdguid$
, $Name$
, $CSDtype$
, $Population2021$
, $GeoLocation$
);

takbb · March 14, 2024, 10:04am

Hi @RRInToronto , you mention that it is only inserting the first row…

In your list of nodes you haven’t included a loop end node (eg Variable Loop End).

If that’s been overlooked the loop is not complete and won’t iterate.

I agree with @mlauber71 re iterating the row though… Unless you have a small number of rows to insert, iterating through individual insert statements will be orders of magnitude slower than using DB Insert to insert in bulk. If “knowing sql” is the only reason for wanting to do it that way, I’d highly recommend giving the dedicated insert node a try. But for small datasets, if it works for you then that’s fine.

I think though in your insert statement you are likely to need to include single quotes around your string flow variables, since the Sql Executor will perform simple string replacements of the named flow variables. It doesn’t treat them like “sql parameters”, so syntactically you’ll have to take care to include any quotes that are required by sql.

eg

INSERT INTO Cities (
CSDdguid
, Name
, CSDtype
, Population2021
, GeoLocation)
VALUES (
'$CSDdguid$'
, '$Name$'
, '$CSDtype$'
, $Population2021$
, '$GeoLocation$'
)

(I’m assuming that just population2021 is numeric, and the others are strings)

RRInToronto · March 14, 2024, 2:59pm

Hi takbb,

Ok, I’ve implemented your suggestion and used the “DB Insert” node.
Now I’ve got a new issue: the insert statement generated fails because the column names are quoted.
This is the generated insert statement:
INSERT INTO “public”.“cities” (“CSDdguid”, “Name”, “CSDtype”, “Population2021”, “GeoLocation”) VALUES (‘2021A00053502008’,‘Hawkesbury’,‘T’,10194,‘-74.611297,45.608871’)

When I try to run it in psql, I get this error:
ERROR: column “CSDdguid” of relation “cities” does not exist - but it does!!!

When I remove all the double quotes from the insert column selection clause, it runs fine in psql (the postgres command line utility).

This inserts one row:
INSERT INTO “public”.“cities” (CSDdguid, Name, CSDtype, Population2021, GeoLocation) VALUES (‘2021A00053502008’,‘Hawkesbury’,‘T’,10194,‘-74.611297,45.608871’)

How can I configure the “DB Insert” node to not place double quotes around the columns specification list in the insert statement it generates?

Thank you,

Richard

takbb · March 14, 2024, 3:02pm

HI @RRInToronto , the setting for “quoting identifiers” is usually in the advanced settings of the database connection node.

If you tick the “Delimit only identifier with spaces” it shouldn’t add the quotes in the subsequent nodes.

RRInToronto · March 14, 2024, 3:24pm

I appreciate your help.

I enabled that flag.
The generated insert statement:
INSERT INTO public.cities (CSDdguid, Name, CSDtype, Population2021, GeoLocation) VALUES (‘2021A00053502008’,‘Hawkesbury’,‘T’,10194,‘-74.611297,45.608871’)

When running with Knime, I get this error message (from the Knime Console):
ERROR: column “geolocation” is of type point but expression is of type character varying
Hint: You will need to rewrite or cast the expression.

Psql:
The above insert statement works when I run it verbatim!!!

What is going on here?

RRInToronto · March 14, 2024, 3:41pm

The Postgres table definition:
Create Table If Not Exists Cities (
Id Serial Not Null Primary Key
, CSDdguid Char(16) Not Null Unique
, CSDType Varchar(10) Not Null
, Name Varchar(50) Not Null Unique
, Population2021 Integer Not Null
, GeoLocation Point Not Null
);

RRInToronto · March 14, 2024, 4:15pm

This insert is failing in Knime:
INSERT INTO public.cities (CSDdguid, Name, CSDtype, Population2021, GeoLocation) VALUES (‘2021A00053502008’,‘Hawkesbury’,‘T’,10194,‘-74.611297,45.608871’)

Yet, it works when I execute it in psql.

takbb · March 14, 2024, 4:17pm

Hi @RRInToronto , I’m not familiar with postgres’s point datatype, nor why it should be causing the error if, as you say, the statement works outside of KNIME. I wonder if there is a difference in the database driver. The error itself appears to be coming back directly from postgres, as I’ve found the same wording elsewhere in web searches, so it’s odd that it should be considered ok in one client call, but not in another.

I tried the sql on an online postgres database (db-fiddle.com) and it worked ok.

This is a long shot, but based on something I read elsewhere, could you try concatenating open and close parentheses onto the point data so that the insert would look like this:

'(-74.611297,45.608871)'

i.e.

INSERT INTO public.cities (CSDdguid, Name, CSDtype, Population2021, GeoLocation) VALUES ('2021A00053502008','Hawkesbury','T',10194,'(-74.611297,45.608871)')

I hadn’t previously appreciated that the geo info was anything other than a varchar column, so it might be that you will have to perform sql executor inserts after all although I hope not!

mlauber71 · March 14, 2024, 4:49pm

@RRInToronto well, well. This is how this could be done, though I would not recommend it: You convert your values from each line of data in a Flow Variable and then use the INSERT command thru the SQL Executor. This does work though I would not recommend it

It would be better to just use KNIME DB nodes to load the data. Some databases also support bulk load which might be good if you have lots of data.

If you deal with special values. You might check that you have defined them in the DB Table Creator node and also you might have to CAST them or encapsulate them in a specific way (this might be true for your geo informations).

RRInToronto · March 14, 2024, 5:54pm

I can associate a ‘DB Insert’ node with either the Postgres database node or the source data node, by selecting one or the other in Knime, then double clicking on the ‘DB Insert’ node to drop it onto the workflow.

Does it matter whether I have the db node selected or the source data node selected when I drop the insert node onto the workflow?

mlauber71 · March 14, 2024, 6:15pm

@RRInToronto OK. I have tried to simulate your situation with a Lat/Long information in KNIME and then try to bring that to a PostgreSQL database. And what can I say. It does work. Maybe not the most elegant of all solutions but it will enter the data into “GeoLocation Point” field

RRInToronto · March 14, 2024, 6:57pm

It seems to me that the issue is either the driver and/or the “Point” datatype used in the Cities.GeoLocation column.
I am so close to getting it working using the “DB Insert” node.

mlauber71 · March 14, 2024, 7:04pm

@RRInToronto I am struggling with the Point data format and was not successful so far. But the loop with the flow variables seems to work …

RRInToronto · March 14, 2024, 7:23pm

I have dropped the GeoLocation (Point datatype) column from the table, and the “DB Insert” node is now adding all expected rows.

I need to get Knime’s “DB Insert” node to insert into a table with a Point datatype column.
How can I do that?