1-Nearest Neighbors (KNIME) vs IB1 (WEKA)

omora · January 26, 2012, 9:57am

Hi,

I would like to test and compare the nearest neighbor implementations of WEKA and KNIME. To do this I have used the following datasets:

abalone.data.test
abalone.data.train

(you can find them in the datasets section: http://www.knime.org/files/datasets.zip)

To do this I have built a couple of workflows, one with the Nearest Neighbor (KNIME) node, and the other with the IB1 (WEKA). Following, the configurations for each one:

Nearest Neighbor (KNIME)

Number of neighbors to consider: 1 (K1)

Weight neighbors by distance: disabled

Results: Accuracy: 48.94%

2. IB1 (WEKA)

This node only considers 1 neighbor in the classification process (K1)

Results: Accuracy: 48.18%

I can not see the the cause of that difference. What could be the reason of these different results, using a so straightforward algorithm as K1 ?

thanks in advance

Oscar

Iris · January 26, 2012, 12:59pm

Hi Oscar

okay this is just a random guess.

one problem in knn is how to decide the class if there is more than one nearest neighbor. e.g. you have two training patterns having the exactly same distance to the test pattern.

Typical behavior is to take the majority class out of "all" 1-nn.

But some implementations take the first 1-nn they find. other the last.

Iris · January 26, 2012, 1:11pm

yes, just verified it.

the weka predictor always take the first found 1-nn. The KNIME predictor takes the majority class

You can see this with the following data set:

1 0 a

1 0 b

1 0 c

Should be classified as b, is with weka classified as a and with KNIME as b.

omora · January 26, 2012, 6:41pm

Perfect !

Many thanks for your explanation!

Oscar