Using Math Formula on columns with dynamic column names

kowisoft · December 7, 2018, 10:40am

Hi there,

I have a pivot table where the names of the columns are dynamic.

In the example below, these are the names “Prod 1” - “Prod 4” but it could be different every time I run the workflow.

So far I have hardcoded the column names in the math formula to determine the index of the column with the highest value (see my other post here)

Is there an option to determine the index with the highest sales values while at the same time not entering them hard into the formula?

PS: I am not focused on using the math formula node, it was just what I have used so far.

Thank you in advance!

moritz.heine · December 7, 2018, 1:18pm

Hi,

you can always rename the column to a specific name and rename it after the Math Formula node e.g. by storing the original name in a flow variable and using that in the Column Rename node after the math formula.

Another option, if the index of these columns remain the same and only the name changes, is the Column Expression Node, which is part of the extension ‘KNIME Expressions’.
This node also provides mathematical functions and you can access columns dynamically.

Unfortunately the syntax is slightly different as you have to provide the column index yourself:

colMax(column(“Prod1”), 0, column(“Prod2”), 1, column(“Prod3”), 2, column(“Prod4”), 3)

where 0, 1, 2, and 3 indicate the column index of the columns Prod1, Prod2, Prod3, and Prod4.
In this specific casethe following line is equivalent:

colMax(column(0), 0, column(1), 1, column(2), 2, column(3), 3)

If the column index does not remain the same you can do something like:

names = arrayFlat(columnNames())
colMax(column(“Prod1”), arrayIndexOf(names, “Prod1”), column(“Prod2”),
arrayIndexOf(names, “Prod2”), column(“Prod3”), arrayIndexOf(names, “Prod3”))

Now the dynamic part: you can extract the names you want to use into a flow variable and instead of writing:

column(“Prod1”) you could acces the column via column(variable(“firstCol”)).

Cheers,
Moritz

kowisoft · December 7, 2018, 1:58pm

Thank you @moritz.heine

Unfortunately the index changes every month so using the Column Expression node does not seem to be an option.

I do not get the array function you mentioned further down below. I know what an array is but I cannot understand the syntax. Is there a link or something that can help to dive deeper into this?

moritz.heine · December 7, 2018, 2:16pm

Hi,

sorry, I forgot to mentioned that the array support of the Column Expressions is a new feature of KNIME Analytics Platform 3.7.

The arrayFlat()-method is a workaround as the return type of columnNames() cannot be simply used by the arrayIndexOf()-function.

arrayFlat, and arrayIndexOf are functions under the category “Array”.

If upgrading the Analytics Platform is currently not an option, I’d recommend the first solution where you have to rename the columns to a specific name and then re-name it afterwards.

Cheers,
Moritz

MH · December 7, 2018, 2:57pm

I think you could also use the Column Aggregator Node. It allows you to select the columns you want to aggregate on either manually, via wildcard or regular expression patterns or select them by data type. I hope that would give you enough freedom to select and aggregate the columns that you want.

kowisoft · December 7, 2018, 3:00pm

Thank you @MH however, I don’t want to aggregate the columns.

I want to identify the column that holds the highest value and extract its heading to be used in another table.

kowisoft · December 7, 2018, 3:03pm

Hi Moritz,

this sounds like a good approach. However, I am currently behind a firewall so maybe I have to test it out at home and update there. After that I try to tackle Column Expressions with flow variables

I will report here, if this solved my problem.

MH · December 7, 2018, 3:04pm

Well than maybe use the Unpivot Node first (has the same selection options like the Column Aggregator Node), sort the ColumnValues and Filter the first row. Would that be an option?

kowisoft · December 7, 2018, 3:09pm

hmmmm… I don’t think so. The Pivoting does exactly what I expect it to do (grouping / pivoting the line items to a table “products sold per customer”) it should afterwards work with the resulting table (find highest value) from the pivot node.

The thing I want to find out is which product had the highest sales per customer (e. g. Prod3 for customer A, Prod 4 for customer B, see attached screenshot) and then proceed working with that information (in my case I extract the heading and concatenate this info to another table - top 10 customers).

2018-12-07%2016_07_16-Output%20table%20-%202_5%20-%20Missing%20Value

MH · December 7, 2018, 3:34pm

This is what i get with the Unpivot Node and the Rank Node.

Unpivot.knwf (8.9 KB)

Here is the workflow. Hope it helps

kowisoft · December 10, 2018, 3:18pm

now finally I got it working. Thank you very much @MH

The reason I was wrong (and not your idea) was my assumption that the unpivoting node is something like a reverse pivot that ONLY works with previously (through KNIME) pivoted data. But I learned that I can use it “independently” from the Pivot node (although I have one in my workflow)

Perfect, now I have completely made my workflow dynmic (that was the only part except for selecting a new source file every month where I had to hand-adjust the workflow).

system · December 17, 2018, 3:18pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.