Passing Aggregation method into Database 'Group By' node via flow variable

I am using KNIME 3.5.3 and accessing Postgresql (I think v9). I want to pass the aggregation method into the Database GrouBy node using a variable. I assume that the terms included in the ‘description’ are the parameter values to use but whilst this works for ‘MAX’, ‘MIN’, ‘STDDEV_POP’ and STDDEV_SAMP’ I get warning at configuration stage (and thus no execution) for ‘AVG’, ‘COUNT’ and ‘SUM’. Warning message is (e.g.) ‘The function ‘AVG’ is not supported by Postgresql’. When I enter the mode in the node and look at the SQL then ‘AVG’ function is being applied though…

Also, any tips to do Median calculation in the database via nodes, I guess I can use some form of sort and sample…

Regards

Steve Thornton

Hi @Steve.Thornton -

This looks like a bug to me. Thanks to your detailed description, I was able to reproduce it in both the existing Database nodes as well as the new DB preview nodes. I’ll file a ticket for it.

Thanks very much for reporting this!

As for the median, I’m not a Postgres expert, but you might try something like this within a Database Query node. I tested it and it seems to work OK on the adult dataset:

SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY age) as median
FROM #table# as foo
GROUP BY sex

(I found this suggested here: https://dba.stackexchange.com/questions/106440/get-10th-and-90th-percentile-by-customer)

Thanks ! I’ll try the media thing and let you know.

For information, I actually got around the issue by constructing a query statement for the aggregation using the variable string manipulation node and then passing the resulting variable into the DB query node.

e.g.

$${Sstatistic}$$ = “avg” => “SELECT AVG(value) AS valnum,LOCALTIMESTAMP AS update_timestamp,product_id,aggregation_id FROM #table#c GROUP BY product_id,aggregation_id”

Might keep this anyway as it would work for the median solution I think.

2 Likes