Platform wide support for referencing columns by position number

It would add a ton of speed of work and flexibility to KNIME if we had the ability to reference columns by position as well as by column name. (Similar to referencing by “FieldNumber” in Alteryx, which is a somewhat lesser known capability of the platform) It would find its uses across the board in KNIME as well, but especially in nodes like the Formula / Rules / Expressions / Group / Pivot / Filter / Splitter / Table Row to Variable / Variable Row to Table, etc.

Most importantly it would allow for quick construction of dynamic 100% portable workflow segments that could simply be copied and pasted across workflows with little to no need for adjustments. I know that we can currently do this with components, but a few simple column number references could be done in seconds and save the time and hassle of using dozens of nodes / flow variables / configurations it takes to make a component dynamic and portable. It would essentially allow for more dynamic component like portability out in the open (not wrapped in a component) with less dependence on flow variables.

We could simply copy and paste a combination of nodes to duplicate functions with no need for adjustment. Obviously it would have to be used strategically like everything else in KNIME, you wouldn’t want to run your entire workflow based off column positions.

Hi @iCFO , currently this can be done via the Column Expressions. You can access by index or by column name, hardcoded or dynamically via variables using the column() function.

I built a quick demo for you:
image

Input table:
image

I defined these 2 variables as sample that I used in the demo:
image

Let’s look at the 4 different expressions (2 hardcoded, 2 dynamic):
Exp 1: Access by hardcoded column name (the most common one):
column("third column")

This will access the column with name “third column”. Expected value “c”

Exp 2: Access by hardcoded index:
column(4)

This will access the column at index 4 (meaning the 5th column). Expected value “e”

Exp 3: Access by column name defined by a variable:
column(variable("col_name"))

This will access the column with name defined in the variable “col_name”, which is “second column”, therefore this will access the column with name “second” column". Expected value “b”

Exp 4: Access by column index defined by a variable:
column(variable("col_index"))

This will access the column with index defined in the variable “col_index”, which is “1”, therefore this will access the column with index 1 (meaning the 2nd column). Expected value “b”

Here’s how the Column Expressions looks like with these 4 expressions defined:

Results:

Demo workflow is here:

1 Like

Great news about this capability in the Column Expressions node @bruno29a! I have searched this several times and never found anything like this.

Is this only applicable in the Column Expressions Node, or is there a way to use it with nodes that utilize “Rule” based syntax? Is there some way to use this for manual field selection across nodes like Splitter, GroupBy, Pivoting, Table to Variable, etc?

Hi @iCFO , not that I know of. I’ve only seen this capability via the Column Expressions unfortunately.

I know it’s not ideal, but as a workaround, you would have to add your rules in the Column expressions if you really need to. You can also identify your column via Column Expressions and “temporarily” rename it to a static name that you can then use as a reference (reference to the static name) in other nodes, and then rename back the static name to the original name. Again, just as a workaround if you really need to.

1 Like

Thanks @bruno29a. Those are the kind of workarounds that I am using currently and was hoping to avoid, but this will make it faster.

Learning about the ability to reference columns by position in the Column Expressions node adds a great multi-tool to my quiver! I tend to rely heavily on the Column Expressions node, so I am already getting plenty of use out of this tip. Thanks!

And another big thanks in general for being so active and helpful on this community forum overall!

(I am going to leave it as “unsolved” in the hopes that column position references may have a chance at platform wide adoption for all nodes.)

2 Likes

No problem @iCFO , happy to help.

I’m requesting a +1 for me for your request, since it is a useful request.

@iCFO maybe a not very elegant idea or even a stupid question. But could you just rename every column in your table like

My_id = Column1
Name = Column2

Then use that in all you nodes and keep the reference if you must. You would sometimes have to extract the “Column” part to get a number maybe.

I have done such things at the start of a loop where I did not want to code everything with flow variables and sometimes it was not feasible.

So I renamed each variable at start of its loop to xyz_mydummy_var do the thing with that name and rename it back.

Java Snippet nodes can access columns by name and index as well. Check out the node description in KNIME or the Hub:

The method you’re looking for is getCell(). It accepts name or index, but also requires the data type. Sneak peek:

3 Likes

@mlauber71 You mentioned renaming flow variables. Did you just manually create a duplicate flow variable of a different name and then reference the value of the previous flow variable by formula, or were you able to rename flow variables dynamically (like by index) somehow?

I suppose that I could create a pair of syncable components to effectively add this functionality to key areas. One to rename columns (by index #) to standardized dummy column names, and the second to return them to the original column names.

I don’t think that I would use it for a wholesale rename of all columns very often, but I would probably use it for side calculations on soil off single columns or column groupings for easier portability of common prep functions.

@Thyme I have only used Java Snippet on a few occasions as I am not strong enough in Java to work quickly. It takes a lot of research / trial and error at this point. I plan to dive into it once I have finished fully converting our systems over from Alteryx though.

Hi @iCFO @mlauber71 @Thyme , I don’t think it’s possible to rename a variable. It’s something that has been discussed and a feature that has been requested (like we can rename a column).

I think what @mlauber71 is suggesting is what I alluded to previously when I said you can “temporarily” rename your column to static names and to then reference to the static name. Using @mlauber71 's example of:
My_id = Column1
Name = Column2

You would then reference Column1 and Column2 (or the other way around, depending on what you are renaming) in the nodes downstream.

In terms of using Java Snippet, it goes down to using programming language. Python Script can do this too. Not only it can access column by index, it can also access rows by index. The Column Expressions is also a “programming” node too, Java Snippet and Python Script are just alternatives.

But all of these are workaround alternatives. It would be nice that Knime allows references to columns dynamically in the other nodes, like @iCFO is requesting.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.