How to get the individual values of a column without groupby

gamboasim · April 13, 2016, 7:46pm

The reason why i want to avoid using groupby is because we are using a large data set (10million lines) for sales analysis. In the month column i know we only have 2-3 values on the files we work with (we can check it via the "Show possible values" option on the data table preview). Is there a way i can just extract those possible values wihtout needing to invest a long time into compressing the table to get those 2-3 values?

I want to get the values to create a table row to variable loop start to calculate some sales index per each of the months i have data for.

Thank you in advance,

Simon

aborg · April 14, 2016, 9:50am

Hi Simon,

I think Value Counter is probably what you are looking for, though you might need a RowId node also to create a column from the row keys. An another alternative might be the Statistics node with its third output table, though as it does more, it probably takes longer to finish.

Cheers, gabor

Ergonomist · April 14, 2016, 11:26am

Simon,

"Extract table spec" is also potentially useful for this purpose. For some (if not all) of these other options it may be required to recompute the domain, though this (I believe) is what GroupBy does by default, and what makes it so much slower in comparison.

Cheers
E

Geo · April 15, 2016, 1:47am

Are the data in a database on a server computer ? If yes, why not perform the GROUP BY in the SQL query?

Moreover, if the KNIME nodes suggested here above are still not speedy enough, why not design an appropriate algorithm in Java or Python using the corresponding scripting node ?