Issue with regex and end-of-string anchor in DB Looping node

Hi KNIME Community,

I am encountering an issue with a regex pattern involving an end-of-string anchor $ when using the DB Looping node. I think it may be a bug or at least an inconsistency.

The context is as follows: I am working with a PostgreSQL database, and my query uses the regex pattern ^(CL|CL_renal|CL/F)$. This query evaluates successfully when I press the “Evaluate” button within the DB Table Selector node (Custom Query). However, when I connect this to a DB Looping node, I receive the following error:

ERROR DB Looping 8:2106 Configure failed (IllegalArgumentException): Illegal group reference

This error occurs even before the node is configured and run. I’ve confirmed that the regex works perfectly in pgAdmin and in the DB Table Selector evaluation but not once connected to the DB Looping node. If I remove the end-of-string anchor $ from the regex, the DB Looping node works fine but the regex is not the intended one.

Has anyone experienced a similar issue, or does anyone have suggestions on how to resolve this? Your advice would be greatly appreciated.

Thank you for your assistance.

Best,
Gio

Hi,
the looping node does only put all values of the specified column into following SQL query:
SELECT * FROM table WHERE Col1 IN ($Col1_values$)
See Node Description

I’m not an expert in PostgreSQL but I think the “IN” construct does not support regex patterns.

3 Likes

Hi @ActionAndi,

Thank you for your reply and help. It seems my description wasn’t clear, but I’m not using the regex inside the “IN” construct of the DB Looping node. I’ll try to explain better.

I have a certain query in a DB Table Selector node that contains a regex:

...
WHERE act.standard_type ~ '${Sregex}$'
...

This query works fine, even when I press the “Evaluate” button of the DB Table Selector node. Afterwards, the DB Table Selector node is connected to a DB Looping node with the following simple query:

SELECT * FROM #table# AS "table"
WHERE "chembl_id" IN ($chembl_id$)

The condition in this node is unrelated to the regex. When I connect this node, I get the error specified in my first post.
I think there is something wrong with how the DB Looping node parses the end-of-string anchor $ coming from the DB Table Selector node.
In fact:

  • Removing the end-of-string anchor $ solves the problem (but it changes the meaning of the regex).
  • The full query (reported below) works well in PgAdmin.
…
WHERE act.standard_type ~ '$${Sregex}$$'
AND  "chembl_id" IN (<LIST_OF_CHEMBL_IDS>)
…

Any other suggestions?

Ah, sorry for misunderstanding your question.
It’s not a solution but as a workaround you can create your query manually.
I did similar quite often in the past.
Just use the “group by” node to create a list of the entries in your “chembl_id” column.

2 Likes

Thank you for the workaround idea.
In any case, I would leave the thread unresolved in case someone from KNIME wants to take a look and check if this is a bug of the DB Looping node. I repeat that it happens only when the “IN” construct in the DB Looping node is combined with a query on the DB Table Selector node that contains a ~ construct and a regex containing an end-of-string anchor $.

@gcincilla maybe you can provide a sample workflow demonstrating this problem. Maybe with dummy data that one can load to a postgres database.

1 Like

Hi @mlauber71, sure! An example workflow is attached below.
To make it work you just need to create an empty postgreSQL database name “items_test”. The DB will be first populated by the workflow and then the problem is exemplified using 2 queries:

  1. A working query query NOT containing end-of-string anchor $ in the regex
  2. A not-working query containing end-of-string anchor $ in the regex

As I said this is a rather specific problem involving a combination of elements but I think there’s something wrong with the DB Looping node. I’m using KNIME 5.2.6.

Can anybody confirm the problem?

DB Looping problem.knwf (63.1 KB)

1 Like

Hi @gcincilla

I’ve not looked at your example workflow, but from what I can see, I’d say it’s likely to be an unfortunate issue where the $ is being misinterpreted by the DB LOOPING node. (Because it uses a $ symbol to introduce flow-variables and columns)

I can create a similar failure in ORACLE (and I would guess any other database) with the following custom query in DB Table Selector:

SELECT * FROM dual
where dummy <> '$'

or

SELECT '$' as currency, 100 as amount FROM dual

Then if I try to connect this with a DB Looping node, e.g.

SELECT * FROM #table# "table"
where amount in ($column1$)

(or exclude the where clause altogether)

it gets confused by the presence of the $ sign. So not PostgreSQL or regex related. Purely a problem with the presence of a $ sign in the query.

I think a likely workaround is that whereever you need to include a $ sign in the original query, you would have to concatenate CHR(36) or CHAR(36) depending on database dialect

e.g. for my above examples:

SELECT * FROM dual
where dummy <> chr(36)

or

SELECT chr(36) as currency, 100 as amount FROM dual

Obviously replacing the $ symbol with the concatenation of chr(36) might be trivial or non-trivial depending on your use case.

So yes, I would agree that this is really a bug in the db looping node, since $ symbols are perfectly valid for inclusion in SQL (e.g. for currency, regex and even table/column names), and therefore it ought to be coded to handle them correctly without being tripped up by parsing for variable column names etc.

1 Like

Hi @takbb,
Yes, the problem with a $ is what I was suspecting (in my case as being part of a regex). Thanks for confirming the problem in a different way! That’s certainly useful.
I already have a workaround but I would like to inform KNIME developers about the problem. As other DB nodes can deal with the $ sign, I’m sure they can fix the behaviour of DB Looping node in one of the next versions.
I hope they can see this thread.
Cheers,
Gio

1 Like