Has anyone done any positive unlabeled learning in KNIME? My medical application has positives, with other rows being unlabeled. I have no verified negatives, therefore, I don’t think that active Learning nodes will work for me. The uncertainty in the labeling of unlabeled rows comes from the fact that about 25% of the non-positive rows have a missed positive diagnosis.
I have not done any positive unlabeled learning in KNIME yet, but it sounds doable to me. From what you describe, you could try unsupervised learning on the whole dataset, essentially a clustering method. If all goes well, all positively labeled rows land in one cluster, along with some of the unlabeled ones (the ones with the missed positive diagnosis).You’d have to tune the parameters of the clustering algorithm such that this is the case, potentially with the Parameter Optimization Loop Node pair. As a metric the node pair shall maximize respectively minimize, the ratio of positively labeled rows in one cluster versus the other could work - but there you’d have to experiment on the exact details. But this way you could support the clustering algorithm with the known labels.
A little unrelated, but maybe still inspirational is this workflow: Semi Supervised Clustering – KNIME Hub where two clustering methods are combined.
If you want to read up on the literature, I think a good term to start your search is “one-class semi-supervised learning”
Hope that helps and gives a rough idea,
Sounds like work for DBSCAN. To play with parameters to do not have labeled data in noise cluster.