evaluating k-medoids clusters

Hi all,

How can I evaluate the clusters resulting from using k-medoids method?

I want to measure precision, recall, f-measure and accuracy

I tried to use scorer and entorpy nodes, but no results appear

Which node can I use? and What is the Input (or the columns or components of the input)?

Thanks

Hi singing bird,

Please find attached a toy example I've assembled from what I guess you are trying to do. Basically I think you can go on with the Scorer node, but you have to make sure that the values for the cluster ID are consistent between the two configured input columns.

Best regards,

 

--

Jorge

 

Hi there is current on line course about Knime that shows how to evaluate clusters:

https://learn.eduopen.org/course/view.php?id=40

@Jorge by looking at your response it seems that you did a supervised evaluation right?

Best Regards

 

Hi,

If it is of any use: there is a Binary Scorer node as part of the Generic Nodes in the Lhasa contributions (https://tech.knime.org/lhasa-nodes-for-knime), which according to the description can score waht you want:

Binary Scorer

Takes an input containing an activity (experimental/true) and multiple (or one) prediction columns. For multi class classifications you should use the KNIME Scorer node. This node has been developed for binary classification and you must specify the value of active (positive) and inactive (negative). Values can be specified for equivocal and out of domain regardless of whether they are present in the prediction column.

Missing values are handled in the following ways: missing activity is ignored completely regardless of selection of "Missing out of domain". Selecting the missing out of domain option will increment the out of domain count when the prediction value is missing but the activity value is present.

Calculates:

Balanced accuracy: Sensitivity + Specificity / 2

Accuracy: TP + TN / 2

Sensitivity: TP / (TP + FN)

Specificity: TN / (TN + FP)

Precision aka Positive Predictivity (PPV): TP / (TP + FP)

Negative predictivity (NPV):TN / (TN + FN)

Recall: TP / (TP + FN)

F-Measure 2 * ((precision * recall) / (precision + recall))

Also outputs the counts for TP, FP, TN, FN, number of equivocals and number of out of domains and coverage (% not out of domain).

 

Good luck/Evert

Hi Mau,

The Data Generator node creates a column with the ground truth regarding the membership (Cluster Membership)

The k-Medoids node finds the desired number of medoids (in this case k=3) on the generated data set, creates a new column with the predicted cluster membership (Cluster) and puts the ROWID of the appropiate medoid on each one of its cells. 

Then, I use the Cell Replacer node to substitute the ROWIDs with the cluster names of each medoid, so that the Scorer node can work properly by comparing the predicted membership with the ground truth.

 

Best Regards,

--

Jorge

 

 

 

 

Hi Mau,

The Data Generator node creates a column with the ground truth regarding the membership (Cluster Membership)

The k-Medoids node finds the desired number of medoids (in this case k=3) on the generated data set, creates a new column with the predicted cluster membership (Cluster) and puts the ROWID of the appropiate medoid on each one of its cells. 

Then, I use the Cell Replacer node to substitute the ROWIDs with the cluster names of each medoid, so that the Scorer node can work properly by comparing the predicted membership with the ground truth.

 

Best Regards,

--

Jorge