Column Expressions Node (Formulas should calculate in series)

It would be a massive upgrade to the Column Expressions node if the expressions calculated in series based on their order. I am assuming that the current limitations of the Column Expressions node were derived from a desire to speed up processing, but I think this is a myopic viewpoint that leads to a flawed user experience.

The inability to process expressions in a series (making a series of changes to the same columns) causes us to clutter a workflow with tons of Column Expressions nodes making it extremely difficult to locate a specific expression later. Do 10 Column Expressions nodes in a row really process that much faster then 1 nicely organized Column Expressions node processing in series? Do some users really prioritize faster processing of a single node over user friendly operations of a wider workflow? If so, then perhaps we couldn’t we just add a setting to allow them to turn off processing Expressions in a series / or have them use multiple Column Expressions nodes with a single expression?

Even if we don’t completely change the current node, it would be great to at least get a version of this node that processes in series. Here are just some of the benefits we would gain.

Less clutter - Dozens of Column Expressions in a workflow could be consolidated into a single node. I know that a user can consolidate multiple nodes into a Metanode or Component, but that only makes things worse by adding another level of difficulty when you need to find a specific expression and make alterations.

Better Organization - The ability to clearly see the expressions and their order in the calculation series in a single node configuration would make it infinitely easier to organize a workflow, quickly change order of operations and locate expressions in a workflow. We would then be able to combine a series of expressions into a single node and label them clearly for tracing.

Better Portability - A single nicely organized Column Expressions node that performs a series of processes could be instantly copied and pasted.

(The Formula tools in Alteryx operate in a series like this, and it really does make a dramatic improvement in workflow organization and user friendly formula tracing / adjustments.)

3 Likes

On the subject of Column Expressions, whilst I don’t use the node all that much, I see the ability to refer to an earlier row to be a significant step forward. However, as was highlighted in a recent forum thread, it is still not possible for a column that is being created to be self-referencing (i.e. refer to the value that it now holds on the previous row that has just been created).

This unfortunately makes the Column Expressions node incapable (as far as I can see) of performing cumulative calculations; something that it might otherwise be ideally suited for.

If not possible with the column() function then something like a “self” function which means the column being generated, with a relative row position would be good:
e.g

That addition would also be a nice feature in a future version, or in a new formula-style node.

3 Likes

Hi @iCFO @takbb , on top of what you guys have said, another feature that I would like to see is when we have multiple Expressions, you can order the Expressions by moving them up or down, and I would like to be able to use/access the results generated by “upper” expressions (those that are sorted above the current expression and are supposed to execute before the current expression).

This makes even more sense in Variable Expressions to have. I should be able to access the variables that are created/modified in expressions ordered above the current expression.

One of the advantage of using Column/Variable Expressions is that you can write multiple expressions in the same node. However, when we can’t access the Columns/Variables that are created in the same node, then we have to add additional Column/Variable Expressions nodes to be able to do so.

Back in the days when there was no Variable creator/editor nodes, I was creating my variables via the Variable Expressions node, and one common thing that I created was the full path of files. Since I wanted these to be configurable, I had a variable for file name which would be configurable, and a variable for base path, which could be configurable, and another variable that would be controlled by the workflow which would be the full path, being a join of base path and filename. But I could never create the full path in the same node as the values of filename and base path were not accessible within the same node via another expression. I had to add another Variable Expressions since I could access the values only after the first Variable Expressions had executed.

3 Likes

Hello there!

If memory serves me well there should be a ticket for this enhancement :wink:

Br,
Ivan

Good call @bruno29a

Variable Expressions absolutely needs the same upgrades!

The ability to see the example formula output where we select the formula ordering would be crucial as well. I would also like the ability to manually enter alternate row numbers to use for all of the example calculations instead of always just seeing the 1st row. This would massively help when tracing back the impact of formulas on certain target values.

Another layout change that it would like to see is to just have the entry/viewing area for each expression in the same place as where you order the list similar to the Alteryx layout. You can include a minimize / maximize toggle to help reduce the need for scrolling. The current layout where the lists ordering is separate from the entry area would be less convenient for viewing and adjusting a series of formulas. I would also like an up and down arrow (with a keyboard shortcut) right there next to the entry area for easy ordering changes.

It would also be nice to access new columns / variables created previously in the series to be used in subsequent expressions.

Don’t forget the ability to copy and paste or duplicate existing expressions! It drives me crazy when this isn’t available in a UI. Duplicating (or Copy /Paste) of an entire existing expression / settings saves so much time. I would much rather duplicate and then alter an existing working formula then always have to work from scratch and run the risk of wasted time searching for a missing a parentheses somewhere. This is especially helpful for longer if statements or nested statements. The ability to copy and paste the formulas themselves would be a minimum.

I would also love to have universal access to save and access our own custom user formulas across every “expressions” node. That would allow us to name / organize formulas and easily drop in example formulas that we could alter. This would be especially helpful for complex Regex formulas which are a slow trial and error process for many users.

In fact, it would allow us to build our own little solutions warehouse and save working examples from other shared solutions on the forum for easy access within KNIME.

The ability to reference “self” and “current column” in an expression would be an awesome way to make expressions more portable and universal.