Covert SQL to code for Column Expressions

huynhduc · June 4, 2024, 3:11am

Hi everyone, I have this data as attached. I wrote SQL for drafting first and now I want to bring it to work on KNIME but I don’t know how to convert it into Column Expressions (if any node works).

Here is my SQL, please advise me to change it into KNIME:

WITH cte AS(
SELECT*
FROM test (1).test – My data attached
ORDER BY user_name, Date DESC)
SELECT
c1.user_name, c1.Date, c1. amount, c1. Total.
CASE
WHEN MAX(c1.amount)-SUM(c2.total) >=0 THEN c1.Total
WHEN MAX(c1.amount)-SUM(c2.total)<0 AND c1.Total + (MAX(c1.amount)-SUM(c2.total)) >=0 THEN c1. Total + (MAX(c1.amount))-SUM(c2.total))
ELSE 0
END AS refund
FROM cte c1
JOIN cte c2 ON c1.user_name = c2.user_name AND c1.Date <= c2.Date
GROUP BY c1.user_name, c1.Date, c1.amount, c1.Total
ORDER BY c1.user_name, c1.Date DESC

takbb · June 4, 2024, 10:04am

Hi @huynhduc , Column Expressions would not be a good fit for a translation of SQL.

because…

a single column expression returns only a single column
group by and aggregate functions across rows is better done by other nodes, such as the “GroupBy” node
in general, a variety of different nodes would be used for producing the different types of returns for the various columns.

So the question I have when anybody asks how to do something using “Node X”, is are they specifically wanting to use that node for the purposes of learning that node, or are they actually just wanting to get the job done, using whichever nodes are best suited?

For most people, it is actually that they just wish to get the job done rather than apparently saying “please demonstrate how I can paint my car using a hammer”

I have reproduced your table here in text form so that it can be easily copied by other people wishing to help. NB - It was translated from the picture by chatGPT so may not be entirely accurate.

`user_name`	`Code`	`amount`	`Ref number`	`Date`	`Total`
`Argentina`	`55053875`	`10000000`	`2606`	`2024-05-02`	`10000000`
`Australia`	`266622764`	`10000000`	`2905`	`2024-05-02`	`10000000`
`Bangladesh`	`1155821483`	`10000000`	`2809`	`2024-05-02`	`10000000`
`Brazil`	`466493113`	`10000000`	`2899`	`2024-05-02`	`10000000`
`Cambodia`	`600526694`	`10000000`	`3008`	`2024-05-02`	`10000000`
`Canada`	`897852173`	`10000000`	`3037`	`2024-05-02`	`10000000`
`Chile`	`151470`	`10000000`	`3722`	`2024-05-02`	`10000000`
`China`	`15276633`	`200000`	`3289`	`2024-05-02`	`200000`
`Colombia`	`158649922`	`300000`	`105296`	`2024-03-03`	`200000`
`Colombia`	`158649922`	`300000`	`6849`	`2024-02-04`	`200000`
`Egypt`	`598833157`	`26000000`	`2342`	`2024-05-02`	`20000000`
`Egypt`	`598833157`	`26000000`	`141676`	`2024-04-02`	`7200000`
`Egypt`	`598833157`	`26000000`	`105460`	`2024-03-03`	`200000`
`France`	`58239472`	`10000000`	`2908`	`2024-05-02`	`10000000`
`Germany`	`226630716`	`10000000`	`2421`	`2024-05-02`	`10000000`
`India`	`53831405`	`10000000`	`2836`	`2024-05-02`	`10000000`
`Indonesia`	`358075353`	`20000000`	`2993`	`2024-05-02`	`20000000`

Now, looking at the SQL, can you confirm if it is actually doing what you want?

You are using max(c1.amount) in your calculations, but also grouping by c1.amount, for example, so I feel that may not be what you intended.

Back to the question of whether you are wanting to use specific nodes, or see how it could be done with KNIME nodes, or if you “just want to get the job done”… if a solution were presented where you could simply execute the SQL (or an adapted version of it) against your KNIME table, would that be potentially suitable?

huynhduc · June 5, 2024, 2:16am

Hi @takbb , thanks for you reply and.
Yes, the data transfered is correct.

I agree with you that I only want the job done. I prefer using column expressions because as so far as I know that column expressions node can produce coding and easier than Java Snippet.

I welcome all your help because I am still stuck with this.

Yes, correct for the SQL. I grouped the c1. amount only when the c1.Date <= c2. Date. Means that when the date is not max(Date), group all of them.

Refund is the desired column. Feel free to adjust the SQL code or recommending me other nodes.

I appreciate all of your contribution.

user_name	Code	amount	Ref number	Date	Total	Refund
Vietnam	372449446	10,000,000	2551	2024-05-02	10,000,000	10,000,000
Vietnam	372449446	10,000,000	105373	2024-03-03	200,000	0
United States	1109683890	10,000,000	2725	2024-05-02	70,000,000	10,000,000
United Kingdom	541673807	10,000,000	2619	2024-05-02	10,000,000	10,000,000
United Arab Emirates	904448505	10,000,000	3170	2024-05-02	10,000,000	10,000,000
Turkey	603732558	12,000,000	2830	2024-05-02	10,000,000	10,000,000
Turkey	603732558	12,000,000	68901	2024-02-04	200,000	200,000
Thailand	1036127587	10,000,000	3157	2024-05-02	10,000,000	10,000,000
Spain	1008863264	10,300,000	2673	2024-05-02	10,000,000	10,000,000
Spain	1008863264	10,300,000	141975	2024-04-02	600,000	300,000
South Africa	645067822	10,000,000	3377	2024-05-02	10,000,000	10,000,000
Singapore	903750383	9,800,000	2743	2024-05-02	10,000,000	9,800,000
Saudi Arabia	549777848	9,600,000	2790	2024-05-02	10,000,000	9,600,000
Russia	1041407098	10,000,000	2744	2024-05-02	10,200,000	10,000,000
Russia	1041407098	10,000,000	68704	2024-02-04	1,600,000	0
Philippines	829374517	10,000,000	2900	2024-05-02	10,000,000	10,000,000
Peru	44808261	10,000,000	2701	2024-05-02	10,000,000	10,000,000
Pakistan	311832019	10,000,000	2896	2024-05-02	10,000,000	10,000,000
Nigeria	256445542	20,000,000	2336	2024-05-02	20,000,000	20,000,000
Nigeria	256445542	20,000,000	105040	2024-03-03	200,000	0
Mexico	613539683	10,000,000	3222	2024-05-02	10,000,000	10,000,000
Malaysia	670405651	10,000,000	3136	2024-05-02	10,000,000	10,000,000
Laos	962880062	10,000,000	3000	2024-05-02	10,000,000	10,000,000
Korea	1043947886	20,000,000	2669	2024-05-02	20,000,000	20,000,000
Korea	1043947886	20,000,000	141962	2024-04-02	600,000	0
Kenya	1014413172	200,000	2952	2024-05-02	200,000	200,000
Japan	293754452	10,000,000	3253	2024-05-02	20,000,000	10,000,000
Japan	293754452	10,000,000	68922	2024-02-04	400,000	0
Italy	1158583813	10,000,000	3218	2024-05-02	10,000,000	10,000,000
Indonesia	358057353	20,000,000	2993	2024-05-02	20,000,000	20,000,000
India	53831405	10,000,000	2836	2024-05-02	10,600,000	10,000,000
Germany	226630716	10,000,000	2421	2024-05-02	10,000,000	10,000,000
France	58239472	10,000,000	3201	2024-05-02	10,000,000	10,000,000
Egypt	598833157	26,000,000	2342	2024-05-02	20,000,000	20,000,000
Egypt	598833157	26,000,000	141676	2024-04-02	7,200,000	6,000,000
Egypt	598833157	26,000,000	105046	2024-03-03	200,000	0
Colombia	158649922	300,000	105296	2024-03-03	200,000	200,000
Colombia	158649922	300,000	68849	2024-02-04	200,000	100,000
China	15276633	200,000	3289	2024-05-02	200,000	200,000
Chile	151470	10,000,000	3372	2024-05-02	10,000,000	10,000,000
Canada	897852173	10,000,000	3037	2024-05-02	20,000,000	10,000,000
Cambodia	600526694	10,000,000	3306	2024-05-02	10,000,000	10,000,000
Brazil	466493113	10,000,000	2899	2024-05-02	10,000,000	10,000,000
Bangladesh	1155821483	10,000,000	2809	2024-05-02	10,000,000	10,000,000
Australia	266622764	10,000,000	2950	2024-05-02	10,000,000	10,000,000
Argentina	55053875	10,000,000	2606	2024-05-02	10,000,000	10,000,000

takbb · June 5, 2024, 8:07am

Hi @huynhduc , thanks for the update. If you are happy that the SQL is doing what you want, then that’s fine. I wanted to be sure before trying to “reimagine” it into KNIME nodes but I realise this post isn’t about being an SQL tutorial, lol, so here goes…

I’ve used the initial data set for this example, because I’d already started playing with this.

My question about whether you’d be happy for an SQL solution was because one of the simpler options is to make your KNIME data into a database table. Typically you could do this by creating an H2 database connection, creating a table and uploading your KNIME table into it, and then execute a query against that data.

That though, feels a bit “involved”, so it was for that use case that I previously created some components that are freely available on the hub.

For a single one off table, you can use this component:

and then write the sql query directly against it.

I have uploaded a demonstration workflow to the hub here:

You simply direct your data table at the component

and configure it, specify the columns to be included, and any columns for which indexes should be created (optional, but can improve performance on larger datasets). You also give the table a name, in this case “test”

Once that’s done, you can access this data using SQL

WITH 
cte AS
(
SELECT "user_name","Date","amount","Total"
FROM "test" 
)
SELECT
    c1."user_name",
    c1."Date",
    c1."amount",
    c1."Total",
case 
WHEN MAX(c1."amount")-SUM(c2."Total") >=0 THEN c1."Total"
WHEN max(c1."amount")-SUM(c2."Total") < 0 
     AND  c1."Total" + (MAX(c1."amount") - SUM(c2."Total")) >=0 
     THEN c1."Total" + (MAX(c1."amount") - SUM(c2."Total"))    
     ELSE 0
    END AS "refund"
FROM cte c1
JOIN cte c2 ON c1."user_name" = c2."user_name" AND c1."Date" <= c2."Date"
GROUP BY c1."user_name", c1."Date", c1."amount", c1."Total"
ORDER BY c1."user_name", c1."Date" DESC

as you can see, this SQL is very similar to your original, but I have just had to deal with the case-sensitivity of the column and table names.

As an interesting exercise though, I have also built the equivalent of that same SQL using standard KNIME nodes

The two give the same results (albeit there are differences in data types - long vs double/integer) between the two mechanisms.

For a further demo of the KNIME Table Session Connector, and its counterpart the KNIME Table Connector (used if you want to join to different KNIME tables using the same database connection), see this workflow:

and also this post

huynhduc · June 6, 2024, 2:18am

Hi @takbb

Your solution was really impressive! I tried it and it worked.

However, the node KNIME Table Session Connector. I tried to download it but it showed me this error. Can you advice me on this please?

nodes contained nodes raised a warning

MISSING NoOp (Table) #13007: Node can’t be executed - Node “NoOp (Table)” not available from extension “NodePit Power Nodes” (provided by “Philipp Katz, Daniel Esser; nodepit.com”; plugin “com.nodepit.noop.plugin” is not installed)

takbb · June 6, 2024, 5:41am

Hi @huynhduc , ah OK, my apologies. It seems I had forgotten that the component uses the “noop” nodes that are part of the nodepit power nodes.

These are a set of free utility and convenience nodes, available from here:

The installation instructions can be found on the above link.

Let me know if you are unable to install these, as I can probably modify the components to not use those nodes if it is a barrier to using the components.

takbb · June 6, 2024, 7:23am

I’ve updated the components and sample workflows so they no longer require the NodePit Power Nodes. So if you try to drop them on a workflow now, they should no longer show the error.

huynhduc · June 6, 2024, 8:17am

Thank you @takbb

That’s so awesome, I did it!
Once again, I appreciate your help a lots.

Wish you the best on your journey!!

Sajith · June 9, 2024, 11:39pm

Amazing work @takbb ! So now the end user has another toolkit and can work on his / her data using SQL as well aside from python and java (from a low code perspective).

I’m assuming this is similar to having duckDB in a way.

Just a question is it possible to type in the column names without the apostrophes; i.e. | Select user_name, Date | instead of | SELECT “user_name”,“Date” |

takbb · June 10, 2024, 8:04am

Hi @Sajith

This depends on your data table. In line with (most) SQL databases, the underlying H2 database treats all column and table names as upper case unless enclosed in double-quotes. So provided that your table name and column names consist only of upper case letters, underscores and digits (no spaces, or lower case) then the underlying table and columns will be created using upper case, and you can then enter your query without using double-quotes:

WITH 
cte AS
(
SELECT user_name,Date,amount,Total
FROM test 
)
SELECT
    c1.user_name,
    c1.Date,
    c1.amount,
    c1.Total,
case 
WHEN MAX(c1.amount)-SUM(c2.Total) >=0 THEN c1.Total
WHEN max(c1.amount)-SUM(c2.Total) < 0 
     AND  c1.Total + (MAX(c1.amount) - SUM(c2.Total)) >=0 
     THEN c1.Total + (MAX(c1.amount) - SUM(c2.Total))    
     ELSE 0
    END AS refund
FROM cte c1
JOIN cte c2 ON c1.user_name = c2.user_name AND c1.Date <= c2.Date
GROUP BY c1.user_name, c1.Date, c1.amount, c1.Total
ORDER BY c1.user_name, c1.Date DESC

Any identifiers not enclosed in double quotes are automatically converted to uppercase by the database.

There is an option on the H2 Connector to “Delimit only identifier with spaces” which defaults to off. I could potentially add that option to the component, but I think it would be more confusing.

This is because it would cause column names to become upper case except if the column name contained a space. So a column called user_name would automatically be created as USER_NAME but a column user name, because it contains a space would be automatically quoted by the database driver when the underlying table is created, and so the name would remain in lower case user name. I feel it is clearer if (where possible) the column names remain as they are in the input table.

Let me know if you want to convert your column names to upper case prior to using the components, and need assistance with achieving that.

Sajith · June 10, 2024, 8:41pm

Thanks for your explanation, really appreciate it!

That’s fine, I could use column name extractor / column renamer to capitalise before I push the data to the node and reset it after it’s passed through the node.

system · June 17, 2024, 8:42pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.