Problems using AWS Redshift compute-node only function (listagg) with Database GroupBy (legacy)

#1

Hello, I’m noticing a strange Redshift error when undertaking database grouping using either the ‘Database GroupBy (legacy)’ node or the Database Query (legacy) node. This specifically relates to using a custom Listagg aggregation function. This has worked for me okay in the past but is now (as of some time in January 2020) giving the following error:

[Amazon](500310) Invalid operation: One or more of the used functions must be applied on at least one user created tables. Examples of user table only functions are LISTAGG, MEDIAN, PERCENTILE_CONT

To test this, I’ve created the following table on my Redshift cluster:

category item
a car
a ball
a bike
b red
b green

I’m seeking to produce a list aggregation grouping on this table to give an output like

a car|ball|bike
b red|green

I can achieve this without problems by running the following query from an SQL editor:

SELECT category,
listagg(distinct item, '|') 
within group (order by item)
over (partition by category) 
AS aggragadabra
FROM  schema_name.listagg_test;

However, if I run the same expression from a Database Query (legacy) node (of course using #table# instead of the table name) I get the above error. I also get the same error when using the Database GroupBy (legacy) node (with a custom aggregation).

It seems to me that something about the way KNIME is sending the query to Redshift is making Redshift want to run the query on a leader node, rather than only a compute node. It then issues the above error, as flagged in:

https://docs.aws.amazon.com/redshift/latest/dg/c_SQL_functions_compute_node_only.html

Much appreciate any thoughts or suggestions.

Andrew.

1 Like

#2

Hey @andrew.dun,

This could be an issue with a surrounding WHERE clause that always returns false…
https://forums.aws.amazon.com/thread.jspa?threadID=242096
Could you post the generated SQL query from the Group-By/Database Query node?
You will find it at the output port view of the nodes.

Do I understand it correctly, that the same query used to work before? Which KNIME version are you using, did you upgrade KNIME in-between?

best Mareike

0 Likes