Statement Creation

I prefer creating e.g. SQL statements or REST statements ahead of time in full text within Knime. This enables me to copy the statements out of Knime and e.g. use them directly with a different SQL Tool.

But, i came to dislike String Manipulations Join function, be it in the String Manipulation node, Column Expression Node, or the worst in the new Expression node.
These are fine if you need to use 1-2 columns or flow vars and have shorter statements.
But longer statements are a pita.

Essentially, writing something like this in Knimes Expression Node:

join(
    "",
    "USE ", $(database_name)$, ";
    IF NOT EXIST (
        SELECT column_name
        FROM information_schema.columns
        WHERE
            schema_name = '", $(schema_name)$, "' AND
            tablen_name = '", $(table_name)$, "' AND
            column_name = 'HARDCODED'
    )
    ALTER TABLE '", $(schema_name)$, "'.'", $(table_name)$, "'
    ADD COLUMN 'HARDCODED' NVARCHAR(48)
    NOT NULL CONSTRAINT 'constr_", $(schema_name)$, "_", (table_name)$, "_HARDCODED' DEFAULT 'HARDCODED';"
)

feels just bad because not only do you have to pay attention to the quoting symbols of the dialect you are talking to ( [] vs "" vs '' ) but you have the additional quotes, commas and $$, $[ or $( from Knime in your view (depending on the node you use). Personally is very annoying.

I used to run something like this in the past where instead I wrote my statement in one go and added placeholders to replace

ALTER TABLE {} ADD COLUMN {} NVARCHAR(48) NOT NULL CONSTRAINT 'constr_{}_{}_HARDCODED' DEFAULT 'HARDCODED'

or

ALTER TABLE {1} ADD COLUMN {2} NVARCHAR(48) NOT NULL CONSTRAINT 'constr_{3}_{4}_HARDCODED' DEFAULT 'HARDCODED'

and then replaced it in another step to add all the values in the right places. Hence, you only have the " at the start and end once.
This isnt too far off from classical printf() and alike but it feels a lot smoother to produce for me starting from a table like

database_name schema_name table_name
db1 schema1 table1
db1 schema1 table2
db2 schema1 table1

Anyone else facing or tackling these issues in one way or another which could less annoying?

Hi @fe145f9fb2a1f6b , I agree that assigning dynamic values into strings through concatenation is a painful experience and as I have had to do something similar in KNIME many times, I created a component to assist.

This is for dynamic replacement using variables rather than columns, though, so to work with a table - which I think is what you are trying to do - it can easily be added inside a “table row to variable” loop.

How to use it:

Create a string flow variable using a node such as CSS Editor (now legacy!). It doesn’t matter that you’re not writing CSS; the node just allows easy creation of a multi-line string variable, so you just write your exclusion in that.

After that, configure the Dynamic Variable Replacer (Multi Variable) component. You will probably need to attempt to execute it prior to configuration so that it can see the available variables.

As you can see from this diagram, you choose one of the listed “Dynamic Variable Format” options, and write your variable parts within the CSS Editor in that format


e.g.

    USE  $${database_name}$$;
    IF NOT EXIST (
        SELECT column_name
        FROM information_schema.columns
        WHERE
            schema_name = '$${schema_name}$$' AND
            table_name = '$${table_name}$$' AND
            column_name = 'HARDCODED'
    )
    ALTER TABLE '$${schema_name}$$.$${table_name}$$'
    ADD COLUMN 'HARDCODED' NVARCHAR(48)
    NOT NULL CONSTRAINT 'constr_$${schema_name}$$_$${table_name}$$_HARDCODED' DEFAULT 'HARDCODED';

On execution, the component performs string replacement on all identified variables. The output of the loop in this example is as follows, with the values for each row being dynamically applied to the sql-statement:

Here is the above workflow example

Demo Dynamic String Variable.knwf (249.1 KB)


As an aside, this also includes a quick way of including date/time stamps (see the options in the component help for details). As an for example it can be used to quickly create a dynamic logfile name including datetime stamp like this:

5 Likes

the looping methods are well known but highly inefficient imho.
I scouted through many options today, but given that e.g. String Replacer cannot select a column but requires hardcoded values, there is no elegant solution.

workaround using java / python snipped work but seem unreasonable, too

I’m not sure what you mean by the “looping methods are well known”. From your question "Anyone else facing or tackling these issues in one way or another which could less annoying? ", I thought you were looking for a way of avoiding writing awkward and error-prone string concatenations, which is why I mentioned my component. But the component was designed for replacing flow variables ( as this was what I needed when I wrote it) hence the use of the loop.

How many rows are you dealing with? Looping on a handful of rows, as in your example is relatively trivial in terms of overhead. If your requirement is on a large number of rows then that would change the approach so probably worth mentioning in the question.

I would also like KNIME to have the native ability to handle dynamic string replacements, but it doesn’t so I wrote one into a component. Sorry that there is “no elegant solution”.

Hi @fe145f9fb2a1f6b, following up on this, I have created a new component to allow it to work with Columns directly, instead of flow variables. This then removes the table row to variable loop requirement.

Demo Dynamic String Column Replacement.knwf (52.5 KB)

Screenshot is using KNIME 4.7.8 (using an old version of KNIME for this component simply to provide compatibility across all newer versions)

Internally (as before) it uses java - this time it uses java snippet instead of java edit variable to allow it to work with columns directly, and hopefully would give reasonable performance, but I’ll leave that to you to assess for your use case.

If this still doesn’t achieve what you need, please give some constructive feedback on what is missing.

hi @takbb interesting that you look further into this eliminating the join() but it still reads really clunky and isnt really efficient imo.
in your approach, you still need to know the column names and write the “clunky” $(name) or $()$ or $$()$$ for flow vars.

My best case scenario is that I can just write down a statement like

SELECT * FROM {1} WHERE columnname = '{2}';

and then replace {n} quickly with column values and flow variables.

one more concrete example:
Having a table with table-name, column-name, new-column-name and some flow vars like the the SQL dialects escape chars or identifiers:

extreme example (cause the static text is short), SQL syntax not correct, just to show the issue

join("", "sp_rename(",
$$(SQL.identifier.open)$$, $(table-name)$, $$(SQL.identifier.close)$$, ".", $$(SQL.identifier.open)$$, $(column-name), $$(SQL.identifier.close)$$, ", ",
$$(SQL.quotation)$$, $(new-column-name)$, $$(SQL.quotation)$$,
", ", $$(SQL.quotation)$$, "COLUMN", $$(SQL.quotation)$$, ");"
)

Writing this is rough because auto-complete isnt present well enough, hence you lose focus on typing and have to to mouse-work so often.

I would prefer writing something like this, even if I know that others dont (again keep in mind this is an extreme case but if you see it in relation to my first post, the idea should be sufficiently clear:

sp_rename({1}{2}{3}.{1}{4}{3}, {5}{6}{5}, {5}COLUMN{5})

final note:
The Expression node offers auto-complete which is already quite a step up from the String Manipulation or Column Expression. But, having to wrangle around with the join function, having to separate every variable in the flow of the text is pure pain.

Hi @fe145f9fb2a1f6b , ok I guess I didn’t fully understand where you were actually trying to get to, and was working on the basis that we know that the facility doesn’t exist natively in KNIME, so attempting to find an alternative.

Personally I don’t like numbered positional parameters because I find it soon becomes unreadable and prone to error, but I get what you are saying.

So, ok we know that String Manipulation and Expressions node doesn’t have what you want built in.

One idea if you just want numbered params is go back to the CSS Editor node as a way of entering the multi-line sql script with positional parameters and enter it as this (you can of course write this outside of KNIME using a proper code editor and then paste in !)

    USE  {1};
    IF NOT EXIST (
        SELECT column_name
        FROM information_schema.columns
        WHERE
            schema_name = '{2}' AND
            table_name = '{3}' AND
            column_name = 'HARDCODED'
    )
    ALTER TABLE '{2}.{3}'
    ADD COLUMN 'HARDCODED' NVARCHAR(48)
    NOT NULL CONSTRAINT 'constr_{2}_{3}_HARDCODED' DEFAULT 'HARDCODED';

Then follow it with String Manipulation where you create a replace statement to replace each parameter with the required column or variable value:

e.g.

replace(replace(replace(
$${Ssql-statement}$$,
"{1}",$database_name$),
"{2}",$schema_name$),
"{3}",$table_name$
)

You’ll need as many replace statements nested as you have positional parameters to replace. You can do the equivalent in an Expression node if you prefer.

It’s still a little “clunky” sure, but it’s probably less arduous

For simple statements, you could of course do the whole thing in an Expression node, such as this

replace(replace(

"SELECT * FROM {1} WHERE columnname = '{2}';",

"{1}",$["table_name"]),
"{2}",$["column_value"])

but you would have to do additional work if your statement itself contained double quotes, or you want it formatted across multiple lines.

Other than that, I think I’m out of ideas :wink:

No worries, in the end I can still do this e.g. in Java Snippets.
The numbered {n} are also just an example. Inline $() with auto complete is also a step forward.

its just annoying that there is no solution in which I can write the statement in one go (no join() shenanigans) (like powershells inline variables)
or have proper auto-complete with tab (hence dont need the mouse)
or work with 5+ nested replace() statements (instead of a dictionary or the typical cs-list known from printf())

Hey,

I recently needed something very similar. So I took the opportunity to dig into our code and check how easily we could address it. What I came up with would cover my use case, and I suspect it would solve yours as well—but I’d love to validate that with you.

The idea is a very simple node with an optional input port. If an input table is provided, it can either replace an existing column or append a new one with text defined in a scripting editor. The nice part is that it supports templating, so you can reference columns and flow variables to substitute parts of the content dynamically. If no input is provided it will just produce one cell with the content you defined.

Let me know what you think and if that would cover your cases as well. The node is still in a very rough shape, naming and options are subject to change and the whole thing is currently only available in my dev setup, but if it could also solve your problems I have one more reason to finish it up :wink:

Greetings,

Daniel

6 Likes

@DanielBog from just looking at it, given the fact auto-complete is included, this would cover many boxes.

further enhancement could be the support of data-type specific functions for the referenced columns/flowvars as known from many programming languages (and is present in the Column Expression node but I think not documented or advertised) like

Jon Doe was born in $(date-column).Year().
His dob was $(date-column).DayOfWeekName()
The solution to the puzzle is $(string-column).Length() characters long.
The file $(path).Filename() is stored in $(path).getParent().toString().
Next your you will be $(int).plus(1) years old.

or the possibility to define some variables that are valid only within the node (which the Column Expression also supports and I think the Expression node cannot do)

Hey,

Love the ideas, but definitely something for the future.
Glad to hear that it would solve your case, I will try to get the node over the finish line.

Greetings,

Daniel

1 Like