Index Match in Knime

chezhiyan · August 14, 2019, 9:48am

I am trying to do a min formula and an index match in a table, was succefull with min formula by column aggregator but unable to find which column has a min value, below i have given example of table and excel formula which works well.

Material Number	302786	302590	302468	302287	302467	309618	309622	MIN	MIN entity
A	1	2	3	4	5	6	7	1	302786
B	12	13	14	8	9	10	11	8	302287
C	13	14	15	16	9	10	11	9	302467
D	14	15	8	9	10	11	12	8	302468
E	15	16	11	12	13	14	15	11	302468
F	16	17	18	12	13	14	15	12	302287
G	17	18	19	9	10	11	12	MIN(B8:H8)	INDEX($B$1:$H$1,1,MATCH(I8,B8:H8,0))

chezhiyan · August 14, 2019, 9:54am

please help me with native nodes

ipazin · August 14, 2019, 12:27pm

Hi there @chezhiyan ,

I don’t see a really easy way to do it. Here is how I got it:

https://kni.me/w/mJW2SjXOdP5nDuUT

Both approaches use loop. Now chose one you like more

Br,
Ivan

chezhiyan · August 16, 2019, 10:09am

thanks @ipazin… my table has 1m rows… after i started loop it never stopped running i was expecting some kinda rule engine/native node/python script, that runs it faster… knime should think about this, a simple index match in excel taking so much time in knime is bad.

chezhiyan · August 16, 2019, 10:18am

Also is it possible to find the index of the column by min formula and then do a lookup from a column name extracter…? please help

Aswin · August 16, 2019, 10:54am

Dear @chezhiyan,

would this work for you? No loops.

chezhiyan · August 16, 2019, 12:01pm

looks perfect can you share the workflow?

chezhiyan · August 16, 2019, 12:01pm

@Aswin please share workflow

Aswin · August 16, 2019, 12:07pm

…but if your dataset consists of a million rows, it is probably better to do it in chunks.

Here is the workflow:

KNIME_project2.knwf (22.9 KB)

Aswin · August 16, 2019, 12:25pm

Ooops sorry @chezhiyan it seems I forgot to configure the loop correctly. It also turns out you don’t even need the Column Appender if you configure the Unpivot and GroupBy nodes a bit differently. The best workflow is a perfectly horizontal branchfree workflow

KNIME_project3.knwf (18.2 KB)

ipazin · August 19, 2019, 8:54am

Hi @chezhiyan,

approach could be improved a bit but not enough for 1 million rows I afraid.

Br,
Ivan

chezhiyan · August 19, 2019, 10:02am

something is wrong, results are not correct

Aswin · August 19, 2019, 10:51am

You say it doesn’t work; do you mean it doesn’t work for the example table in your original post or for your million-row table? On my PC it seems to work fine for the example table. Maybe your million-row table has a combination of different column types that cannot be easily sorted? For example, a mix of string- and numeric columns? That can sometimes happen when importing excel data.

chezhiyan · August 19, 2019, 11:58am

can we do it without sorting? i think that causes an error? is it possible to extract column header seperately? and find column index based on min value and then look up to the table? i think that will work better, but i dont know how to do

Aswin · August 19, 2019, 12:22pm

@chezhiyan there you go, a solution without sorting.

KNIME_project3.knwf (18.6 KB)

A column index method, as you suggest, may be possible somehow with the Column Expressions node…

Best
Aswin

chezhiyan · August 19, 2019, 1:50pm

This worked fine and much faster in table creater i just added index

Corey · August 19, 2019, 1:58pm

Hi all, haven’t compared for speed but here’s one more example to try. More the merry right?
In the event of 2 columns sharing the max value it grabs the left most column name.

Max column.knwf (8.2 KB)

Aswin · August 19, 2019, 4:19pm

Awesome @Corey, nice opportunity for me to learn about the Column Expression node, that one is still a bit mysterious to me. I was wondering: do we really need the Column Aggregator? Turns out we don’t

Max column.knwf (6.7 KB)

Here is yet another method, one that avoids the scary Column Expressions node…

KNIME_project3.knwf (17.3 KB)

Corey · August 19, 2019, 4:23pm

You’re correct! We can just use 2 expressions in the column expressions node if desired.

I had done it with the column aggregator to keep “scripting” to a minimum and because I wonder if it might be faster on large data sets.
I say test both if speed is a concern.

chezhiyan · August 20, 2019, 7:01am

Wow… perfect, fast, simpler to understand… Thanks @Corey