Max value using Index Query Node

rkagrawal · September 22, 2018, 11:32pm

I am trying to run a query using on extract CSV table (not Database) using Index Query and I need help extracting row based on the maximum value on a column data.

Please help me convert traditional query (below) to Index query format

Select * from

where is Max

Thank You for your kind help!

izaychik63 · September 23, 2018, 12:37am

It not clear why you insist on SQL? Same thing you can do with Group By.

rkagrawal · September 23, 2018, 1:22am

I am not grouping, I need to extract rows based on maximum value or a variable

e.g. patient as scores between 0-9 and I need to extract row with the maximum value

izaychik63 · September 23, 2018, 11:40am

This is what Group by will do for you. Use patient as a grouping an max function for the score. As the result you’ll have list of patients with maximum scores.

rkagrawal · September 24, 2018, 5:52am

Yes it worked partially. I do want extract other columns however, then I do not get just one record, but many.

izaychik63 · September 24, 2018, 12:11pm

You need to join back group by result with original table by primary key using Joiner node.

rkagrawal · September 27, 2018, 4:36pm

If I join back then I get duplicate rows.
Some how there should be a way to run query and it makes it so simple.

Groupby is slowing the processing.

I have over 25 mil records with 10 columns so a relatively large dataset and need to run nodes parsimoniously.

izaychik63 · September 27, 2018, 4:46pm

Could you provide an example of original data and what you looking for?

rkagrawal · September 27, 2018, 4:58pm

I have Unique patient Id column with age, gender, score

Each Unique has several rows because of different values for score (0-9), Age (because patient age may change during visits), Gender (typos)

Even If I ignore.

I need rows with maximum age, and maximum score

Hope this helps.

izaychik63 · September 27, 2018, 5:17pm

Group by with grouping by Patient Id with aggregations max on age and max on score will have necessary result.

rkagrawal · September 27, 2018, 5:43pm

THANK YOU! I ran into G overhead limit exceed error…

How do I resolve this the overhead issue…any ideas will be greatly appreciated.

izaychik63 · September 27, 2018, 6:38pm

Are you connect CSV reader directly to Group by?

rkagrawal · September 27, 2018, 6:40pm

No. I actually used a converted Knime Table and then used reference spliter node to remove duplicates and then using group by

izaychik63 · September 27, 2018, 6:50pm

As fast solution try to increase memory parameter Xmx4gb (means 4GB) in the knime.ini file.
Also before Group by insert Cache node. If not help wait for KNIME tech experts.