Finding frequency distribution (absolute and percentage values)

 

Hi,

 

I am finding a way of calculate frequency distribution (absolute and percentage values). I just could find a way to calculate absolute value using “value counter” node. The Knime output was like that:

 

row ID  count

1             5

2             6

3             2

4             8

 

But what I really needed is something like that:

 

row ID  count    %

1             5             23.8

2             6             28.6

3             2             9.5

4             8             38.1

sum       21           100.0

 

a) Is there any way to generate this kind of output (absolute + percentage values) through Knime?

 

b) Besides, each number in row ID represents companies size range (1-small, 2-medium, 3-big, 4-biggest). Is there any way to handle “captions” through Knine? Then, the former table would seem like that:

 

row ID                  count    %

small                     5             23.8

medium              6             28.6

big                          2             9.5

biggest                 8             38.1

sum                       21           100.0

 

c) I am beginning to use Knime now. I was wondering if is possible to “bury” SPSS or Excell and just use Knime to my average data analysis. Does anyone know if it is possible? Or does Knime present any lack that I should maintain my other software usage?

 

Many thanks in advance,

Cadu

 

Ps.: I attached the workflow which generated the data mentioned above. Anyway, it is a simple node sequence of “file reader” + “value counter” + “csv writer”.

To find counts, percentages, averages, max, mins, etc as you describe, the GroupBy node in Data Manipulation/Row/Transform will do what you ask.

You can also change numerical numbers to string contents as you describe around company sizes by using the numerical binner in Data Manipulation/Column/Binner/Numerical Binner.

KNIME can do alot of what Excel can. You also have the Maths node in Misc category too.

Simon.

KNIME is pretty much a lot more advanced tool than Excel and SPSS Statistics or Modeler. However, some things you may feel more comfortable doing in those programs.
For the example shown above, after the Value Counter you only need a Math node with a formula like this:
$count$ / COL_SUM($count$)
and you have the ratio of value x/sum(x) which is the percentage, frequency or rate - w/e you like to use as a term. :slight_smile:

1 Like