I think the best approach is to split the problem into two steps:
- Train model that can detect whether there are people wearing a west in a small image (patch).
- Slice your larger image into patches and apply the model from step 1. on each of them. If any of the patches are predicted positively, we label the whole image as positive. This happens in the Transfer patch prediction to whole image node.
Doing it this way the individual problem for the models are much easier to solve and the combination of them solves your original goal.
I just took another look into the workflow and noticed I did not do this nicely, you can e.g use the Unique concatenate aggregation method on the patch classes to get a list of all classes that occur in the image. Follow it up with an Rule Engine node that contains a logic like the following:
$Prediction (class)$ MATCHES ".*pos.*" => "pos"
TRUE => "neg"
To transform that string into a prediction for the image.
See the attached updated workflow for reference: workflow download.
best,
Gabriel