Binning based on interval width

Hi all,

I’m a new user of knime. I have a workflow scenario where I need to bin my data into intervals of a fixed width. In the existing binning nodes I understand that we can either specify the number of bings or manually enter the ranges of each bin. What’s the best way if I want to bin my data based on interval width say 100?

Thanks all! Appreciate any help

By interval width, do you just mean first 100 rows in one bin, next 100 rows in bin two etc..

If its this simply, use Maths node to add another column using Math Expression RowIndex. Then use the AutoBinner and specify how many bins you want, so if you have 5000 rows, then choose 50 bins for 100 interval width.

 

Simon.

And if you want it to work out how many bins are needed for an interval of 100 automatically you can do this using workflow variables. By just using the Maths node to generate the Row Count, then another Maths node to divide this by 100, convert to an Integer column to round it up/down, then convert to variable and feed into the autobinner node in the "BinCount" row in flowvariables tab.

Simon.

What I meant is if I have values ranging from 0 to 1000, I’d like to bin values from 0 to 100 in bin1, values from 101 to 200 in bin2 etc. So the interval width is the value range not the number of rows( sorry I should’ve been clearer earlier)

Thanks again

Ahh okay,

You can do it, but its a bit more of a faff!

 

So first you need to use a Maths node, and for the expression use;

COL_MAX($column name$)/100

And choose to Append a new column and call it "Bin Count".

This is to find the datarange, and then divide by 100 (for interval of 100) to work out how many bins are needed.

Then convert the output from this Maths node into an Integer using "Double to Integer" node, then remove all the columns except this newly calculated "Bin Count" column with a "Column Filter" node, and then convert this to a variable using "TableRow to Variable". Now going back to the nodes with the main dataset (i.e. prior to the Maths node), attach up the "AutoBinner" node, turn on its flow variables (right click and choose show flow variables) and attach the red blob from the "Tablerow to variable" node to this "autobinner" node. Now configure the autobinner node, go to the Flowvariables tab and under bincount from the dropdown select "Bin Count".

Note the lowest value in the column will be called Bin 1, if you really want Bin counting to start from the number 1 (i.e. 1-100), you will need to add a fake column to your dataset with a value of 1 in it, so it starts counting bins from 1.

 

Hope this is helpful.

Simon.

Thanks a lot. I was working on this path too. This definitely fills the missing pieces in my approach. Thanks again. Appreciate it.

Ahh, just thought of a simple solution after all that.

How about this (just one node to do it!!);

Use a Maths node with the expression;

floor($Column 0$/100)

And choose to Append a new column called "Binning".

 

Hows that,

So many ways to do things in KNIME;

You can also use the Binner (Dictionary) as well, along with a "Create Table Structure" and "Add Empty Rows" node choosing 1000 or so, and then two Maths nodes with expressions "ROWINDEX*100" in one, and "(ROWINDEX+1)*100" in the second one, this is then fed in port 2 of the "Binner Dictionary" node. Clearly not as simple as the other one, so thats 3 ways to do it.

Hope you continue enjoying KNIME.

Simon.